Hubridge

流式输出

使用 stream 模式接收 SSE 增量响应

开启流式

在 Chat Completions 请求体中设置:

{
  "model": "gpt-4o",
  "stream": true,
  "messages": [
    { "role": "user", "content": "写一首短诗" }
  ]
}

响应为 text/event-stream,每行以 data: 开头;结束时会有 data: [DONE]

cURL

curl -N -X POST "$YOUR_GATEWAY_BASE_URL/v1/chat/completions" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "stream": true,
    "messages": [{ "role": "user", "content": "Hello!" }]
  }'

-N 禁用缓冲,便于实时看到输出。

Python(OpenAI SDK)

from openai import OpenAI

client = OpenAI(
    base_url="YOUR_GATEWAY_BASE_URL/v1",
    api_key="YOUR_API_KEY",
)

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)

Node.js(OpenAI SDK)

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "YOUR_GATEWAY_BASE_URL/v1",
  apiKey: "YOUR_API_KEY",
});

const stream = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Hello!" }],
  stream: true,
});

for await (const chunk of stream) {
  const delta = chunk.choices[0]?.delta?.content;
  if (delta) process.stdout.write(delta);
}

注意事项

  • 流式与非流式使用同一 URL,仅 stream 字段不同
  • 客户端需正确处理 SSE 断连与超时
  • 计费仍按实际 token 用量计算,可在 使用日志 查看明细