[ STEP 04 / INTEGRATION ]

Chat Completions

apiv0.1.0sdk@assistiv/sdk@0.2.0

OpenAI-compatible chat completion endpoint. Supports streaming, tool calling, structured output, and multi-provider routing. For hosted MCP tools and the Responses API, see Responses.

ℹBase URL

All endpoints share the same base URL: https://api.assistiv.ai/v1.

POST/v1/chat/completions

Creates a model response for the given chat conversation. Compatible with the OpenAI API format.

Auth: End-user key (sk-eu_*) or platform key (sk-plat_*).

Request Body

Name	Type	Required	Description
model	string	Required	Model slug from GET /v1/models (e.g. gpt-5.6, claude-sonnet-5, gemini-3.5-flash). Always discover via the models endpoint; never hardcode.
messages	array	Required	Array of message objects with role ("system", "user", "assistant", "tool") and content.
stream	boolean	Optional	If true, returns a stream of Server-Sent Events (SSE).Default: `false`
temperature	number	Optional	Sampling temperature between 0 and 2.
max_tokens	integer	Optional	Maximum number of tokens to generate. For GPT-5 family models (which route to OpenAI's Responses API under the hood), the minimum is 16. Values below 16 return 422.
top_p	number	Optional	Nucleus sampling parameter.
tools	array	Optional	List of tool definitions for function calling. Each tool has type, function name, description, and parameters schema.
tool_choice	string \| object	Optional	Controls tool use: "auto", "none", "required", or specific tool.
response_format	object	Optional	Response format constraint. Use {"type": "json_object"} for JSON mode.
frequency_penalty	number	Optional	Penalizes repeated tokens. Range: -2.0 to 2.0.
presence_penalty	number	Optional	Penalizes tokens already present. Range: -2.0 to 2.0.
stop	string \| array	Optional	Up to 4 sequences where the model will stop.
stream_options	object	Optional	Options for streaming. Use {"include_usage": true} to get usage in the final chunk.

typescript

import { Assistiv } from "@assistiv/sdk";

// Use the end-user's sk-eu_* key for inference (not the platform key).
const assistiv = new Assistiv({ apiKey: endUserKey });

const reply = await assistiv.chat.completions.create({
  model: "gpt-5.6",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "What is the capital of France?" },
  ],
  temperature: 0.7,
  max_tokens: 256,
});

console.log(reply.choices[0].message.content);

// Streaming — same call, with stream: true.
const stream = await assistiv.chat.completions.create({
  model: "gpt-5.6",
  stream: true,
  messages: [{ role: "user", content: "Count to 5." }],
});
for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}

json

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1711234567,
  "model": "gpt-5.6",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."},
      "finish_reason": "stop"}
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 8,
    "total_tokens": 33
  }
}

Pre-flight Checks

Before calling the upstream LLM, the gateway runs these checks in order. If any fails, the call returns immediately with no upstream charge:

1.Model validation: Model exists, provider config enabled, scope includes inference.

2.Rate limit check: User override → platform default → pass-through. Returns 429 if exceeded.

3.Suspension check: If the user's budget has is_suspended=true, returns 402 with code budget_suspended.

4.Budget check: If the user has a budget, remaining_usd must be > 0. Returns 402 with code budget_exhausted.

5.Wallet check: Platform wallet must cover estimated cost. Returns 402 with code wallet_insufficient.

error.code	Fix
`budget_suspended`	Admin action — PATCH /budget { is_suspended: false }
`budget_exhausted`	User action — topup, upgrade plan, wait for period reset
`wallet_insufficient`	Platform action — top up the wallet via Stripe checkout

⚠Parse error.code, not just the status

All three failures return 402 Payment Required but require different UX. Parse error.code to show the right message.

Streaming

Set stream: true to receive a stream of Server-Sent Events. Each event contains a delta with partial content. The stream ends with data: [DONE].

Use stream_options: {"include_usage": true} to receive token usage in the final chunk before [DONE].

bash

curl -X POST https://api.assistiv.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-eu_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.6",
    "messages": [
      { "role": "user", "content": "Write a haiku" }
    ],
    "stream": true,
    "stream_options": { "include_usage": true }
  }'

# Response (Server-Sent Events):
# data: {"id":"chatcmpl-...","choices":[{"delta":{"role":"assistant"},...}]}
# data: {"id":"chatcmpl-...","choices":[{"delta":{"content":"Gentle"},...}]}
# data: {"id":"chatcmpl-...","choices":[{"delta":{"content":" breeze"},...}]}
# ...
# data: {"id":"chatcmpl-...","usage":{"prompt_tokens":11,...}}
# data: [DONE]

ℹStream interruption

If the upstream LLM connection drops mid-stream, the gateway treats the interruption as finish_reason: "stop". Tokens already emitted are billed normally — no partial rollback. If you re-send the request after a drop, you will be billed for both attempts. Distinguish "completed" from "interrupted" on the client side by checking whether you received the final [DONE] sentinel.

Tool calling (this is how agents work today)

Agents on Assistiv are not a separate REST surface. Define your tool schema in the tools array on a chat-completions call, run a loop that feeds tool responses back as messages, and you have an agent. The MCP route (see MCP) is the same idea with hosted execution.

Pass tool definitions in the tools array. The model may respond with a tool_calls array instead of regular content. Execute the tool and send the result back as a message with role: "tool". Loop until finish_reason === "stop".

Tool definitions follow the OpenAI function calling format and work across all supported providers (OpenAI, Anthropic, Google, xAI).

json

{
  "model": "gpt-5.6",
  "messages": [
    { "role": "user", "content": "What's the weather in London?"}
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get current weather for a location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string",
              "description": "City name"}
          },
          "required": ["location"]
        }
      }
    }
  ],
  "tool_choice": "auto"}

json

// Model returns tool_calls instead of content:
{
  "choices": [{
    "message": {
      "role": "assistant",
      "content": null,
      "tool_calls": [{
        "id": "call_abc",
        "type": "function",
        "function": {
          "name": "get_weather",
          "arguments": "{"location":"London"}"}
      }]
    },
    "finish_reason": "tool_calls"}]
}

// Execute the tool, then send result back:
{
  "messages": [
    ...previous_messages,
    { "role": "tool", "tool_call_id": "call_abc", "content": "15°C, cloudy"}
  ]
}
// Loop until finish_reason === "stop"

Provider Routing

The gateway automatically routes requests to the correct provider based on:

The model slug maps to one or more provider configurations
Provider must have a valid, active API key for your platform
Provider must be enabled in the model-provider config
If multiple providers match, selection is by priority then weight

Using with OpenAI SDK

Point the base URL at Assistiv and use the end-user key as the API key. Always pass a slug from GET /v1/models, not a hardcoded string.

python

from openai import OpenAI

client = OpenAI(
    api_key="sk-eu_your_end_user_key",
    base_url="https://api.assistiv.ai/v1",
)

# Discover which models are actually enabled
models = client.models.list()
model_slug = models.data[0].id  # or pick by preference

response = client.chat.completions.create(
    model=model_slug,
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)

typescript

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "sk-eu_your_end_user_key",
  baseURL: "https://api.assistiv.ai/v1",
});

// Discover which models are actually enabled
const models = await client.models.list();
const modelSlug = models.data[0].id;

const response = await client.chat.completions.create({
  model: modelSlug,
  messages: [{ role: "user", content: "Hello!" }],
});
console.log(response.choices[0].message.content);