Assistiv Docs

Chat Completions

OpenAI-compatible chat completion endpoint. Supports streaming, tool calling, structured output, and multi-provider routing. For hosted MCP tools and the Responses API, see Responses.

Base URL

All endpoints share the same base URL: https://api.assistiv.ai/v1.

POST/v1/chat/completions

Creates a model response for the given chat conversation. Compatible with the OpenAI API format.

Auth: End-user key (sk-eu_*) or platform key (sk-plat_*).

Request Body

NameTypeRequiredDescription
modelstringRequiredModel slug from GET /v1/models (e.g. gpt-4o, claude-3-5-sonnet, gemini-pro). Always discover via the models endpoint; never hardcode.
messagesarrayRequiredArray of message objects with role ("system", "user", "assistant", "tool") and content.
streambooleanOptionalIf true, returns a stream of Server-Sent Events (SSE).Default: false
temperaturenumberOptionalSampling temperature between 0 and 2.
max_tokensintegerOptionalMaximum number of tokens to generate. For GPT-5 family models (which route to OpenAI's Responses API under the hood), the minimum is 16. Values below 16 return 422.
top_pnumberOptionalNucleus sampling parameter.
toolsarrayOptionalList of tool definitions for function calling. Each tool has type, function name, description, and parameters schema.
tool_choicestring | objectOptionalControls tool use: "auto", "none", "required", or specific tool.
response_formatobjectOptionalResponse format constraint. Use {"type": "json_object"} for JSON mode.
frequency_penaltynumberOptionalPenalizes repeated tokens. Range: -2.0 to 2.0.
presence_penaltynumberOptionalPenalizes tokens already present. Range: -2.0 to 2.0.
stopstring | arrayOptionalUp to 4 sequences where the model will stop.
stream_optionsobjectOptionalOptions for streaming. Use {"include_usage": true} to get usage in the final chunk.
bash
curl -X POST https://api.assistiv.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-eu_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "What is the capital of France?"
      }
    ],
    "temperature": 0.7,
    "max_tokens": 256
  }'
json
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1711234567,
  "model": "gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."},
      "finish_reason": "stop"}
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 8,
    "total_tokens": 33
  }
}

Pre-flight Checks

Before calling the upstream LLM, the gateway runs these checks in order. If any fails, the call returns immediately with no upstream charge:

1.Model validation: Model exists, provider config enabled, scope includes inference.
2.Rate limit check: User override → platform default → pass-through. Returns 429 if exceeded.
3.Suspension check: If the user's budget has is_suspended=true, returns 402 with code budget_suspended.
4.Budget check: If the user has a budget, remaining_usd must be > 0. Returns 402 with code budget_exhausted.
5.Wallet check: Platform wallet must cover estimated cost. Returns 402 with code wallet_insufficient.
error.codeFix
budget_suspendedAdmin action — PATCH /budget { is_suspended: false }
budget_exhaustedUser action — topup, upgrade plan, wait for period reset
wallet_insufficientPlatform action — top up the wallet via Stripe checkout

Parse error.code, not just the status

All three failures return 402 Payment Required but require different UX. Parse error.code to show the right message.

Streaming

Set stream: true to receive a stream of Server-Sent Events. Each event contains a delta with partial content. The stream ends with data: [DONE].

Use stream_options: {"include_usage": true} to receive token usage in the final chunk before [DONE].

bash
curl -X POST https://api.assistiv.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-eu_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      { "role": "user", "content": "Write a haiku" }
    ],
    "stream": true,
    "stream_options": { "include_usage": true }
  }'

# Response (Server-Sent Events):
# data: {"id":"chatcmpl-...","choices":[{"delta":{"role":"assistant"},...}]}
# data: {"id":"chatcmpl-...","choices":[{"delta":{"content":"Gentle"},...}]}
# data: {"id":"chatcmpl-...","choices":[{"delta":{"content":" breeze"},...}]}
# ...
# data: {"id":"chatcmpl-...","usage":{"prompt_tokens":11,...}}
# data: [DONE]

Stream interruption

If the upstream LLM connection drops mid-stream, the gateway treats the interruption as finish_reason: "stop". Tokens already emitted are billed normally — no partial rollback. If you re-send the request after a drop, you will be billed for both attempts. Distinguish "completed" from "interrupted" on the client side by checking whether you received the final [DONE] sentinel.

Tool Calling / Function Calling

Pass tool definitions in the tools array. The model may respond with a tool_calls array instead of regular content. Execute the tool and send the result back as a message with role: "tool". Loop until finish_reason === "stop".

Tool definitions follow the OpenAI function calling format and work across all supported providers (OpenAI, Anthropic, Google, xAI).

json
{
  "model": "gpt-4o",
  "messages": [
    { "role": "user", "content": "What's the weather in London?"}
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get current weather for a location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string",
              "description": "City name"}
          },
          "required": ["location"]
        }
      }
    }
  ],
  "tool_choice": "auto"}
json
// Model returns tool_calls instead of content:
{
  "choices": [{
    "message": {
      "role": "assistant",
      "content": null,
      "tool_calls": [{
        "id": "call_abc",
        "type": "function",
        "function": {
          "name": "get_weather",
          "arguments": "{"location":"London"}"}
      }]
    },
    "finish_reason": "tool_calls"}]
}

// Execute the tool, then send result back:
{
  "messages": [
    ...previous_messages,
    { "role": "tool", "tool_call_id": "call_abc", "content": "15°C, cloudy"}
  ]
}
// Loop until finish_reason === "stop"

Provider Routing

The gateway automatically routes requests to the correct provider based on:

  • The model slug maps to one or more provider configurations
  • Provider must have a valid, active API key for your platform
  • Provider must be enabled in the model-provider config
  • If multiple providers match, selection is by priority then weight

Using with OpenAI SDK

Point the base URL at Assistiv and use the end-user key as the API key. Always pass a slug from GET /v1/models, not a hardcoded string.

python
from openai import OpenAI

client = OpenAI(
    api_key="sk-eu_your_end_user_key",
    base_url="https://api.assistiv.ai/v1",
)

# Discover which models are actually enabled
models = client.models.list()
model_slug = models.data[0].id  # or pick by preference

response = client.chat.completions.create(
    model=model_slug,
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)
typescript
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "sk-eu_your_end_user_key",
  baseURL: "https://api.assistiv.ai/v1",
});

// Discover which models are actually enabled
const models = await client.models.list();
const modelSlug = models.data[0].id;

const response = await client.chat.completions.create({
  model: modelSlug,
  messages: [{ role: "user", content: "Hello!" }],
});
console.log(response.choices[0].message.content);