Chat Completions
OpenAI-compatible chat completion endpoint. Supports streaming, tool calling, structured output, and multi-provider routing. For hosted MCP tools and the Responses API, see Responses.
ℹBase URL
All endpoints share the same base URL: https://api.assistiv.ai/v1.
/v1/chat/completionsCreates a model response for the given chat conversation. Compatible with the OpenAI API format.
Auth: End-user key (sk-eu_*) or platform key (sk-plat_*).
Request Body
| Name | Type | Required | Description |
|---|---|---|---|
| model | string | Required | Model slug from GET /v1/models (e.g. gpt-4o, claude-3-5-sonnet, gemini-pro). Always discover via the models endpoint; never hardcode. |
| messages | array | Required | Array of message objects with role ("system", "user", "assistant", "tool") and content. |
| stream | boolean | Optional | If true, returns a stream of Server-Sent Events (SSE).Default: false |
| temperature | number | Optional | Sampling temperature between 0 and 2. |
| max_tokens | integer | Optional | Maximum number of tokens to generate. For GPT-5 family models (which route to OpenAI's Responses API under the hood), the minimum is 16. Values below 16 return 422. |
| top_p | number | Optional | Nucleus sampling parameter. |
| tools | array | Optional | List of tool definitions for function calling. Each tool has type, function name, description, and parameters schema. |
| tool_choice | string | object | Optional | Controls tool use: "auto", "none", "required", or specific tool. |
| response_format | object | Optional | Response format constraint. Use {"type": "json_object"} for JSON mode. |
| frequency_penalty | number | Optional | Penalizes repeated tokens. Range: -2.0 to 2.0. |
| presence_penalty | number | Optional | Penalizes tokens already present. Range: -2.0 to 2.0. |
| stop | string | array | Optional | Up to 4 sequences where the model will stop. |
| stream_options | object | Optional | Options for streaming. Use {"include_usage": true} to get usage in the final chunk. |
curl -X POST https://api.assistiv.ai/v1/chat/completions \
-H "Authorization: Bearer sk-eu_your_key" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "What is the capital of France?"
}
],
"temperature": 0.7,
"max_tokens": 256
}'{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1711234567,
"model": "gpt-4o",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The capital of France is Paris."},
"finish_reason": "stop"}
],
"usage": {
"prompt_tokens": 25,
"completion_tokens": 8,
"total_tokens": 33
}
}Pre-flight Checks
Before calling the upstream LLM, the gateway runs these checks in order. If any fails, the call returns immediately with no upstream charge:
inference.429 if exceeded.is_suspended=true, returns 402 with code budget_suspended.remaining_usd must be > 0. Returns 402 with code budget_exhausted.402 with code wallet_insufficient.| error.code | Fix |
|---|---|
budget_suspended | Admin action — PATCH /budget { is_suspended: false } |
budget_exhausted | User action — topup, upgrade plan, wait for period reset |
wallet_insufficient | Platform action — top up the wallet via Stripe checkout |
⚠Parse error.code, not just the status
All three failures return 402 Payment Required but require different UX. Parse error.code to show the right message.
Streaming
Set stream: true to receive a stream of Server-Sent Events. Each event contains a delta with partial content. The stream ends with data: [DONE].
Use stream_options: {"include_usage": true} to receive token usage in the final chunk before [DONE].
curl -X POST https://api.assistiv.ai/v1/chat/completions \
-H "Authorization: Bearer sk-eu_your_key" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [
{ "role": "user", "content": "Write a haiku" }
],
"stream": true,
"stream_options": { "include_usage": true }
}'
# Response (Server-Sent Events):
# data: {"id":"chatcmpl-...","choices":[{"delta":{"role":"assistant"},...}]}
# data: {"id":"chatcmpl-...","choices":[{"delta":{"content":"Gentle"},...}]}
# data: {"id":"chatcmpl-...","choices":[{"delta":{"content":" breeze"},...}]}
# ...
# data: {"id":"chatcmpl-...","usage":{"prompt_tokens":11,...}}
# data: [DONE]ℹStream interruption
If the upstream LLM connection drops mid-stream, the gateway treats the interruption as finish_reason: "stop". Tokens already emitted are billed normally — no partial rollback. If you re-send the request after a drop, you will be billed for both attempts. Distinguish "completed" from "interrupted" on the client side by checking whether you received the final [DONE] sentinel.
Tool Calling / Function Calling
Pass tool definitions in the tools array. The model may respond with a tool_calls array instead of regular content. Execute the tool and send the result back as a message with role: "tool". Loop until finish_reason === "stop".
Tool definitions follow the OpenAI function calling format and work across all supported providers (OpenAI, Anthropic, Google, xAI).
{
"model": "gpt-4o",
"messages": [
{ "role": "user", "content": "What's the weather in London?"}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City name"}
},
"required": ["location"]
}
}
}
],
"tool_choice": "auto"}// Model returns tool_calls instead of content:
{
"choices": [{
"message": {
"role": "assistant",
"content": null,
"tool_calls": [{
"id": "call_abc",
"type": "function",
"function": {
"name": "get_weather",
"arguments": "{"location":"London"}"}
}]
},
"finish_reason": "tool_calls"}]
}
// Execute the tool, then send result back:
{
"messages": [
...previous_messages,
{ "role": "tool", "tool_call_id": "call_abc", "content": "15°C, cloudy"}
]
}
// Loop until finish_reason === "stop"Provider Routing
The gateway automatically routes requests to the correct provider based on:
- The model slug maps to one or more provider configurations
- Provider must have a valid, active API key for your platform
- Provider must be enabled in the model-provider config
- If multiple providers match, selection is by priority then weight
Using with OpenAI SDK
Point the base URL at Assistiv and use the end-user key as the API key. Always pass a slug from GET /v1/models, not a hardcoded string.
from openai import OpenAI
client = OpenAI(
api_key="sk-eu_your_end_user_key",
base_url="https://api.assistiv.ai/v1",
)
# Discover which models are actually enabled
models = client.models.list()
model_slug = models.data[0].id # or pick by preference
response = client.chat.completions.create(
model=model_slug,
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)import OpenAI from "openai";
const client = new OpenAI({
apiKey: "sk-eu_your_end_user_key",
baseURL: "https://api.assistiv.ai/v1",
});
// Discover which models are actually enabled
const models = await client.models.list();
const modelSlug = models.data[0].id;
const response = await client.chat.completions.create({
model: modelSlug,
messages: [{ role: "user", content: "Hello!" }],
});
console.log(response.choices[0].message.content);