rosalie602/qwen3.5

Public

WeChat Login

Code Issues Pull requests Events Packages Insights

main

Branch

Tag

Forkfromholmesian/hmz/qwen3.5, behindmain5 commits

Holmesian

chore: 更新配置和模型文件

cb521bbb

26 commits

.vscode
bin
models
unsloth
.cnb.yml
.gitattributes
.gitignore
Dockerfile
README.md
call-ai-api.sh
cnb-proxy
go.mod
go.sum
main.go
setup-nginx-proxy.sh
start-llama.sh
start-ollama.sh
start-proxy.sh
test-glm-api.sh

CNB AI Proxy - OpenAI Compatible API Gateway

在 CNB 平台上运行本地大模型，并通过 OpenAI 兼容的 API 接口提供服务。支持流式传输、CORS 跨域、健康检查等功能。

功能特性

🚀 OpenAI 兼容的 /v1/chat/completions API 接口
🔄 支持流式和非流式响应
🌐 CORS 跨域支持，可直接在浏览器中调用
🔐 基于 CNB_TOKEN 的安全认证
📊 Debug 模式，方便调试请求和响应
💚 健康检查端点
🛠️ 一键启动脚本，自动管理进程

快速开始

1. Fork 并启动云原生开发环境

Fork 本仓库到自己的组织下
选择分支，点击 云原生开发 启动远程开发环境
约 5～9 秒后进入远程开发命令行

2. 设置环境变量


export CNB_TOKEN="your_token_here"

可选环境变量：


export CNB_API_ENDPOINT="https://api.cnb.cool"  # 默认值
export CNB_REPO_SLUG="holmesian/hmz"           # 默认值
export PORT="8888"                              # 代理服务端口，默认 8888
export DEBUG="true"                             # 启用调试模式

3. 启动代理服务


./start-proxy.sh

服务启动成功后：

API 端点：http://localhost:8888/v1/chat/completions
健康检查：http://localhost:8888/health
日志文件：/tmp/cnb-proxy.log

API 使用示例

非流式请求


curl -X POST http://localhost:8888/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [
      {"role": "user", "content": "你好，请介绍一下自己"}
    ]
  }'

流式请求


curl -X POST http://localhost:8888/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [
      {"role": "user", "content": "写一首关于春天的诗"}
    ],
    "stream": true
  }'

使用 OpenAI SDK

Python 示例：


from openai import OpenAI

client = OpenAI(
    api_key="dummy",
    base_url="http://localhost:8888/v1"
)

response = client.chat.completions.create(
    model="auto",
    messages=[
        {"role": "user", "content": "你好"}
    ]
)

print(response.choices[0].message.content)

常用命令

查看日志


tail -f /tmp/cnb-proxy.log

停止服务


pkill -x cnb-proxy

重启服务


pkill -x cnb-proxy && ./start-proxy.sh

健康检查


curl http://localhost:8888/health

调试模式

启用调试模式可查看详细的请求和响应日志：


export DEBUG=true
./start-proxy.sh

调试日志包含：

接收到的原始请求
转发到 CNB API 的请求
CNB API 的响应状态和内容
请求处理摘要

本地模型运行（可选）

本仓库内置了多个 Qwen3.5 模型，可通过 Ollama 运行：

启动 Ollama


./start-ollama.sh

可用模型列表

qwen3.5:0.8b
qwen3.5:2b
qwen3.5:4b
qwen3.5:9b
qwen3.5:27b
qwen3.5:35b

运行模型


ollama run qwen3.5:35b

启动 llama-server


./start-llama.sh

公网访问

在 CNB 控制台的 PORTS 配置中添加 8888 端口，即可将代理服务暴露到公网。

测试脚本

项目提供了测试脚本用于验证 API 接口：


# 测试 CNB API 直接调用
./call-ai-api.sh "测试消息"

# 测试 GLM 模型
./test-glm-api.sh "测试消息"

错误处理

代理服务会将 CNB API 的错误信息转换为 OpenAI 兼容格式：


{
  "error": {
    "message": "错误详情",
    "type": "CNB API Error",
    "code": 401,
    "details": {}
  }
}

常见错误：

credentials expired: CNB_TOKEN 已过期，请更新
invalid token: CNB_TOKEN 无效
rate limit exceeded: 调用频率超限

项目结构


workspace/
├── main.go                # Go 代理服务主程序
├── start-proxy.sh         # 代理服务启动脚本
├── call-ai-api.sh         # CNB API 调用测试脚本
├── test-glm-api.sh        # GLM 模型测试脚本
├── start-ollama.sh        # Ollama 启动脚本
├── start-llama.sh         # llama-server 启动脚本
├── models/                # 本地模型文件
├── bin/                   # llama-server 可执行文件
└── unsloth/               # GGUF 模型文件