[ STEP 04 / INTEGRATION ]
Chat Completions
OpenAI-compatible chat completion endpoint. Supports streaming, tool calling, structured output, and multi-provider routing. For hosted MCP tools and the Responses API, see Responses.
ℹBase URL
All endpoints share the same base URL: https://api.assistiv.ai/v1.
/v1/chat/completionsCreates a model response for the given chat conversation. Compatible with the OpenAI API format.
Auth: End-user key (sk-eu_*) or platform key (sk-plat_*).
Request Body
| Name | Type | Required | Description |
|---|---|---|---|
| model | string | Required | Model slug from GET /v1/models (e.g. gpt-4o, claude-3-5-sonnet, gemini-pro). Always discover via the models endpoint; never hardcode. |
| messages | array | Required | Array of message objects with role ("system", "user", "assistant", "tool") and content. |
| stream | boolean | Optional | If true, returns a stream of Server-Sent Events (SSE).Default: false |
| temperature | number | Optional | Sampling temperature between 0 and 2. |
| max_tokens | integer | Optional | Maximum number of tokens to generate. For GPT-5 family models (which route to OpenAI's Responses API under the hood), the minimum is 16. Values below 16 return 422. |
| top_p | number | Optional | Nucleus sampling parameter. |
| tools | array | Optional | List of tool definitions for function calling. Each tool has type, function name, description, and parameters schema. |
| tool_choice | string | object | Optional | Controls tool use: "auto", "none", "required", or specific tool. |
| response_format | object | Optional | Response format constraint. Use {"type": "json_object"} for JSON mode. |
| frequency_penalty | number | Optional | Penalizes repeated tokens. Range: -2.0 to 2.0. |
| presence_penalty | number | Optional | Penalizes tokens already present. Range: -2.0 to 2.0. |
| stop | string | array | Optional | Up to 4 sequences where the model will stop. |
| stream_options | object | Optional | Options for streaming. Use {"include_usage": true} to get usage in the final chunk. |
import { Assistiv } from "@assistiv/sdk";
// Use the end-user's sk-eu_* key for inference (not the platform key).
const assistiv = new Assistiv({ apiKey: endUserKey });
const reply = await assistiv.chat.completions.create({
model: "gpt-4o",
messages: [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "What is the capital of France?" },
],
temperature: 0.7,
max_tokens: 256,
});
console.log(reply.choices[0].message.content);
// Streaming — same call, with stream: true.
const stream = await assistiv.chat.completions.create({
model: "gpt-4o",
stream: true,
messages: [{ role: "user", content: "Count to 5." }],
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1711234567,
"model": "gpt-4o",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The capital of France is Paris."},
"finish_reason": "stop"}
],
"usage": {
"prompt_tokens": 25,
"completion_tokens": 8,
"total_tokens": 33
}
}Pre-flight Checks
Before calling the upstream LLM, the gateway runs these checks in order. If any fails, the call returns immediately with no upstream charge:
inference.429 if exceeded.is_suspended=true, returns 402 with code budget_suspended.remaining_usd must be > 0. Returns 402 with code budget_exhausted.402 with code wallet_insufficient.| error.code | Fix |
|---|---|
budget_suspended | Admin action — PATCH /budget { is_suspended: false } |
budget_exhausted | User action — topup, upgrade plan, wait for period reset |
wallet_insufficient | Platform action — top up the wallet via Stripe checkout |
⚠Parse error.code, not just the status
All three failures return 402 Payment Required but require different UX. Parse error.code to show the right message.
Streaming
Set stream: true to receive a stream of Server-Sent Events. Each event contains a delta with partial content. The stream ends with data: [DONE].
Use stream_options: {"include_usage": true} to receive token usage in the final chunk before [DONE].
curl -X POST https://api.assistiv.ai/v1/chat/completions \
-H "Authorization: Bearer sk-eu_your_key" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [
{ "role": "user", "content": "Write a haiku" }
],
"stream": true,
"stream_options": { "include_usage": true }
}'
# Response (Server-Sent Events):
# data: {"id":"chatcmpl-...","choices":[{"delta":{"role":"assistant"},...}]}
# data: {"id":"chatcmpl-...","choices":[{"delta":{"content":"Gentle"},...}]}
# data: {"id":"chatcmpl-...","choices":[{"delta":{"content":" breeze"},...}]}
# ...
# data: {"id":"chatcmpl-...","usage":{"prompt_tokens":11,...}}
# data: [DONE]ℹStream interruption
If the upstream LLM connection drops mid-stream, the gateway treats the interruption as finish_reason: "stop". Tokens already emitted are billed normally — no partial rollback. If you re-send the request after a drop, you will be billed for both attempts. Distinguish "completed" from "interrupted" on the client side by checking whether you received the final [DONE] sentinel.
Tool calling (this is how agents work today)
Agents on Assistiv are not a separate REST surface. Define your tool schema in the tools array on a chat-completions call, run a loop that feeds tool responses back as messages, and you have an agent. The MCP route (see MCP) is the same idea with hosted execution.
Pass tool definitions in the tools array. The model may respond with a tool_calls array instead of regular content. Execute the tool and send the result back as a message with role: "tool". Loop until finish_reason === "stop".
Tool definitions follow the OpenAI function calling format and work across all supported providers (OpenAI, Anthropic, Google, xAI).
{
"model": "gpt-4o",
"messages": [
{ "role": "user", "content": "What's the weather in London?"}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City name"}
},
"required": ["location"]
}
}
}
],
"tool_choice": "auto"}// Model returns tool_calls instead of content:
{
"choices": [{
"message": {
"role": "assistant",
"content": null,
"tool_calls": [{
"id": "call_abc",
"type": "function",
"function": {
"name": "get_weather",
"arguments": "{"location":"London"}"}
}]
},
"finish_reason": "tool_calls"}]
}
// Execute the tool, then send result back:
{
"messages": [
...previous_messages,
{ "role": "tool", "tool_call_id": "call_abc", "content": "15°C, cloudy"}
]
}
// Loop until finish_reason === "stop"Provider Routing
The gateway automatically routes requests to the correct provider based on:
- The model slug maps to one or more provider configurations
- Provider must have a valid, active API key for your platform
- Provider must be enabled in the model-provider config
- If multiple providers match, selection is by priority then weight
Using with OpenAI SDK
Point the base URL at Assistiv and use the end-user key as the API key. Always pass a slug from GET /v1/models, not a hardcoded string.
from openai import OpenAI
client = OpenAI(
api_key="sk-eu_your_end_user_key",
base_url="https://api.assistiv.ai/v1",
)
# Discover which models are actually enabled
models = client.models.list()
model_slug = models.data[0].id # or pick by preference
response = client.chat.completions.create(
model=model_slug,
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)import OpenAI from "openai";
const client = new OpenAI({
apiKey: "sk-eu_your_end_user_key",
baseURL: "https://api.assistiv.ai/v1",
});
// Discover which models are actually enabled
const models = await client.models.list();
const modelSlug = models.data[0].id;
const response = await client.chat.completions.create({
model: modelSlug,
messages: [{ role: "user", content: "Hello!" }],
});
console.log(response.choices[0].message.content);