流式输出
使用 stream 模式接收 SSE 增量响应
开启流式
在 Chat Completions 请求体中设置:
{
"model": "gpt-4o",
"stream": true,
"messages": [
{ "role": "user", "content": "写一首短诗" }
]
}响应为 text/event-stream,每行以 data: 开头;结束时会有 data: [DONE]。
cURL
curl -N -X POST "$YOUR_GATEWAY_BASE_URL/v1/chat/completions" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"stream": true,
"messages": [{ "role": "user", "content": "Hello!" }]
}'-N 禁用缓冲,便于实时看到输出。
Python(OpenAI SDK)
from openai import OpenAI
client = OpenAI(
base_url="YOUR_GATEWAY_BASE_URL/v1",
api_key="YOUR_API_KEY",
)
stream = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}],
stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta.content
if delta:
print(delta, end="", flush=True)Node.js(OpenAI SDK)
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "YOUR_GATEWAY_BASE_URL/v1",
apiKey: "YOUR_API_KEY",
});
const stream = await client.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: "Hello!" }],
stream: true,
});
for await (const chunk of stream) {
const delta = chunk.choices[0]?.delta?.content;
if (delta) process.stdout.write(delta);
}注意事项
- 流式与非流式使用同一 URL,仅
stream字段不同 - 客户端需正确处理 SSE 断连与超时
- 计费仍按实际 token 用量计算,可在 使用日志 查看明细
