Assistiv Docs

Responses API

OpenAI-compatible Responses endpoint. Supports structured input, streaming, function calling, and hosted MCP tools. Parallel to Chat Completions but matching the OpenAI Responses API shape.

Base URL

All endpoints share the same base URL: https://api.assistiv.ai/v1.

POST/v1/responses

Creates a model response for the given input. Compatible with the OpenAI Responses API format.

Auth: End-user key (sk-eu_*) or platform key (sk-plat_*).

Request Body

NameTypeRequiredDescription
modelstringRequiredModel slug from GET /v1/models (e.g. gpt-4o, claude-3-5-sonnet, gemini-pro). Always discover via the models endpoint; never hardcode.
inputstring | arrayRequiredShorthand string for simple prompts, or a structured array of input items for multi-turn conversations. See Structured Input below.
instructionsstringOptionalSystem prompt. Equivalent to a system message in chat completions.
toolsarrayOptionalTool definitions for function calling. Same format as chat completions, plus type: "mcp" for hosted MCP tools. See MCP Tools below.
tool_choicestring | objectOptionalControls tool use: "auto", "none", "required", or a specific tool.
previous_response_idstringOptionalContinue from a prior response. The gateway loads the previous conversation context automatically.
thread_idstringOptionalAgent thread identifier. Groups multiple responses into a single conversation thread.
streambooleanOptionalIf true, returns a stream of Server-Sent Events (SSE).Default: false
temperaturenumberOptionalSampling temperature between 0 and 2.
top_pnumberOptionalNucleus sampling parameter.
max_output_tokensintegerOptionalMaximum number of tokens to generate. For GPT-5 family models, the minimum is 16. Values below 16 return 422.
response_formatobjectOptionalResponse format constraint. Use {"type": "json_object"} for JSON mode.
bash
curl -X POST https://api.assistiv.ai/v1/responses \
  -H "Authorization: Bearer sk-eu_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "input": "What is the capital of France?"
  }'
json
{
  "id": "resp_abc",
  "object": "response",
  "created_at": 1744113000,
  "status": "completed",
  "model": "gpt-4o",
  "output": [
    {
      "type": "message",
      "id": "msg_abc",
      "role": "assistant",
      "content": [
        { "type": "output_text", "text": "The capital of France is Paris."}
      ]
    }
  ],
  "usage": {
    "input_tokens": 15,
    "output_tokens": 30,
    "total_tokens": 45
  }
}

Structured Input

The input field accepts either a plain string (shorthand for a single user message) or an array of input items for multi-turn conversations.

Each item in the array has a role (user, assistant, system) and content. Use the instructions field for the system prompt instead of including a system role in the input array.

json
{
  "model": "gpt-4o",
  "instructions": "You are a helpful assistant.",
  "input": [
    {
      "role": "user",
      "content": "What is the capital of France?"},
    {
      "role": "assistant",
      "content": "The capital of France is Paris."},
    {
      "role": "user",
      "content": "What about Germany?"}
  ]
}

Hosted MCP Tools

Pass hosted MCP tools with type: "mcp" in the tools array. The gateway connects to the MCP server, discovers available tools, and the model can invoke them during the response.

For full details on OAuth setup, available apps, and tool configuration, see the MCP / Tools page.

json
{
  "model": "gpt-4o",
  "input": "Create a GitHub issue titled 'Bug report'",
  "tools": [
    {
      "type": "mcp",
      "server_label": "github",
      "server_url": "https://mcp.assistiv.ai/mcp",
      "require_approval": "never",
      "allowed_tools": ["create_issue", "list_issues"]
    }
  ]
}

MCP tools are Responses-only

Hosted MCP tools (type: "mcp") are only supported on /v1/responses. Sending type: "mcp" to /v1/chat/completions returns 422 Unprocessable Entity.

max_output_tokens minimum

For GPT-5 family models, max_output_tokens must be at least 16 per OpenAI's contract. Values below 16 return 422 Unprocessable Entity.

Function Calling

Pass function tool definitions in the tools array with type: "function". When the model decides to call a function, the response contains an output item with type: "function_call".

Execute the function on your side, then send the result back using previous_response_id and an input item with type: "function_call_output". Continue the loop until the model returns a text message.

json
{
  "model": "gpt-4o",
  "input": "What's the weather in London?",
  "tools": [
    {
      "type": "function",
      "name": "get_weather",
      "description": "Get current weather for a location",
      "parameters": {
        "type": "object",
        "properties": {
          "location": {
            "type": "string",
            "description": "City name"}
        },
        "required": ["location"]
      }
    }
  ]
}
json
// Response with function_call output:
{
  "id": "resp_abc",
  "object": "response",
  "status": "completed",
  "output": [
    {
      "type": "function_call",
      "id": "fc_abc",
      "call_id": "call_abc",
      "name": "get_weather",
      "arguments": "{"location":"London"}"}
  ]
}

// Send the result back using previous_response_id:
{
  "model": "gpt-4o",
  "previous_response_id": "resp_abc",
  "input": [
    {
      "type": "function_call_output",
      "call_id": "call_abc",
      "output": "15°C, cloudy"}
  ]
}

Streaming

Set stream: true to receive a stream of Server-Sent Events. Unlike chat completions, the Responses API uses typed event names.

Streaming Events

response.createdResponse object created
response.output_item.addedNew output item started
response.content_part.addedContent part started within an output item
response.output_text.deltaIncremental text chunk
response.output_text.doneText output complete
response.content_part.doneContent part finished
response.output_item.doneOutput item finished
response.completedFull response complete (includes usage)
bash
curl -X POST https://api.assistiv.ai/v1/responses \
  -H "Authorization: Bearer sk-eu_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "input": "Write a haiku about code",
    "stream": true
  }'

# Response (Server-Sent Events):
# event: response.created
# data: {"type":"response.created","response":{"id":"resp_abc",...}}
#
# event: response.output_item.added
# data: {"type":"response.output_item.added",...}
#
# event: response.content_part.added
# data: {"type":"response.content_part.added",...}
#
# event: response.output_text.delta
# data: {"type":"response.output_text.delta","delta":"Gentle "}
#
# event: response.output_text.delta
# data: {"type":"response.output_text.delta","delta":"keystrokes"}
#
# event: response.output_text.done
# data: {"type":"response.output_text.done","text":"..."}
#
# event: response.completed
# data: {"type":"response.completed","response":{...,"usage":{...}}}

Stream interruption

If the upstream LLM connection drops mid-stream, tokens already emitted are billed normally. Distinguish "completed" from "interrupted" on the client side by checking whether you received the final response.completed event.

Pre-flight Checks

The Responses API runs the same pre-flight and billing pipeline as Chat Completions. Before calling the upstream LLM, the gateway checks in order:

1.Model validation: Model exists, provider config enabled, scope includes inference.
2.Rate limit check:User override → platform default → pass-through. Returns 429 if exceeded.
3.Suspension check: If the user's budget has is_suspended=true, returns 402 with code budget_suspended.
4.Budget check: If the user has a budget, remaining_usd must be > 0. Returns 402 with code budget_exhausted.
5.Wallet check: Platform wallet must cover estimated cost. Returns 402 with code wallet_insufficient.

Parse error.code, not just the status

All three payment failures return 402 Payment Required but require different UX. Parse error.code to show the right message. See Chat Completions pre-flight for the full error code table.

Using with OpenAI SDK

Point the base URL at Assistiv and use the end-user key as the API key. Always pass a slug from GET /v1/models, not a hardcoded string.

python
from openai import OpenAI

client = OpenAI(
    api_key="sk-eu_your_end_user_key",
    base_url="https://api.assistiv.ai/v1",
)

# Simple string input
response = client.responses.create(
    model="gpt-4o",
    input="What is the capital of France?",
)
print(response.output[0].content[0].text)
typescript
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "sk-eu_your_end_user_key",
  baseURL: "https://api.assistiv.ai/v1",
});

const response = await client.responses.create({
  model: "gpt-4o",
  input: "What is the capital of France?",
});
console.log(response.output[0].content[0].text);