[ INTEGRATION ]

Responses API

apiv0.1.0sdk@assistiv/sdk@0.2.0

OpenAI-compatible Responses endpoint. Supports structured input, streaming, function calling, and hosted MCP tools. Parallel to Chat Completions but matching the OpenAI Responses API shape.

ℹBase URL

All endpoints share the same base URL: https://api.assistiv.ai/v1.

POST/v1/responses

Creates a model response for the given input. Compatible with the OpenAI Responses API format.

Auth: End-user key (sk-eu_*) or platform key (sk-plat_*).

Request Body

Name	Type	Required	Description
model	string	Required	Model slug from GET /v1/models (e.g. gpt-5.6, claude-sonnet-5, gemini-3.5-flash). Always discover via the models endpoint; never hardcode.
input	string \| array	Required	Shorthand string for simple prompts, or a structured array of input items for multi-turn conversations. See Structured Input below.
instructions	string	Optional	System prompt. Equivalent to a system message in chat completions.
tools	array	Optional	Tool definitions for function calling. Same format as chat completions, plus type: "mcp" for hosted MCP tools. See MCP Tools below.
tool_choice	string \| object	Optional	Controls tool use: "auto", "none", "required", or a specific tool.
previous_response_id	string	Optional	Continue from a prior response. The gateway loads the previous conversation context automatically.
thread_id	string	Optional	Agent thread identifier. Groups multiple responses into a single conversation thread.
stream	boolean	Optional	If true, returns a stream of Server-Sent Events (SSE).Default: `false`
temperature	number	Optional	Sampling temperature between 0 and 2.
top_p	number	Optional	Nucleus sampling parameter.
max_output_tokens	integer	Optional	Maximum number of tokens to generate. For GPT-5 family models, the minimum is 16. Values below 16 return 422.
response_format	object	Optional	Response format constraint. Use {"type": "json_object"} for JSON mode.

typescript

import { Assistiv } from "@assistiv/sdk";

const assistiv = new Assistiv({ apiKey: endUserKey });

const res = await assistiv.responses.create({
  model: "gpt-5.6",
  input: "What is the capital of France?",
});

console.log(res.output[0].content[0].text);

json

{
  "id": "resp_abc",
  "object": "response",
  "created_at": 1744113000,
  "status": "completed",
  "model": "gpt-5.6",
  "output": [
    {
      "type": "message",
      "id": "msg_abc",
      "role": "assistant",
      "content": [
        { "type": "output_text", "text": "The capital of France is Paris."}
      ]
    }
  ],
  "usage": {
    "input_tokens": 15,
    "output_tokens": 30,
    "total_tokens": 45
  }
}

Structured Input

The input field accepts either a plain string (shorthand for a single user message) or an array of input items for multi-turn conversations.

Each item in the array has a role (user, assistant, system) and content. Use the instructions field for the system prompt instead of including a system role in the input array.

json

{
  "model": "gpt-5.6",
  "instructions": "You are a helpful assistant.",
  "input": [
    {
      "role": "user",
      "content": "What is the capital of France?"},
    {
      "role": "assistant",
      "content": "The capital of France is Paris."},
    {
      "role": "user",
      "content": "What about Germany?"}
  ]
}

Hosted MCP Tools

Pass hosted MCP tools with type: "mcp" in the tools array. The gateway connects to the MCP server, discovers available tools, and the model can invoke them during the response.

For full details on OAuth setup, available apps, and tool configuration, see the MCP / Tools page.

json

{
  "model": "gpt-5.6",
  "input": "Create a GitHub issue titled 'Bug report'",
  "tools": [
    {
      "type": "mcp",
      "server_label": "github",
      "server_url": "https://mcp.assistiv.ai/mcp",
      "require_approval": "never",
      "allowed_tools": ["create_issue", "list_issues"]
    }
  ]
}

⚠MCP tools are Responses-only

Hosted MCP tools (type: "mcp") are only supported on /v1/responses. Sending type: "mcp" to /v1/chat/completions returns 422 Unprocessable Entity.

ℹmax_output_tokens minimum

For GPT-5 family models, max_output_tokens must be at least 16 per OpenAI's contract. Values below 16 return 422 Unprocessable Entity.

Function Calling

Pass function tool definitions in the tools array with type: "function". When the model decides to call a function, the response contains an output item with type: "function_call".

Execute the function on your side, then send the result back using previous_response_id and an input item with type: "function_call_output". Continue the loop until the model returns a text message.

json

{
  "model": "gpt-5.6",
  "input": "What's the weather in London?",
  "tools": [
    {
      "type": "function",
      "name": "get_weather",
      "description": "Get current weather for a location",
      "parameters": {
        "type": "object",
        "properties": {
          "location": {
            "type": "string",
            "description": "City name"}
        },
        "required": ["location"]
      }
    }
  ]
}

json

// Response with function_call output:
{
  "id": "resp_abc",
  "object": "response",
  "status": "completed",
  "output": [
    {
      "type": "function_call",
      "id": "fc_abc",
      "call_id": "call_abc",
      "name": "get_weather",
      "arguments": "{"location":"London"}"}
  ]
}

// Send the result back using previous_response_id:
{
  "model": "gpt-5.6",
  "previous_response_id": "resp_abc",
  "input": [
    {
      "type": "function_call_output",
      "call_id": "call_abc",
      "output": "15°C, cloudy"}
  ]
}

Streaming

Set stream: true to receive a stream of Server-Sent Events. Unlike chat completions, the Responses API uses typed event names.

Streaming Events

response.createdResponse object created

response.output_item.addedNew output item started

response.content_part.addedContent part started within an output item

response.output_text.deltaIncremental text chunk

response.output_text.doneText output complete

response.content_part.doneContent part finished

response.output_item.doneOutput item finished

response.completedFull response complete (includes usage)

bash

curl -X POST https://api.assistiv.ai/v1/responses \
  -H "Authorization: Bearer sk-eu_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.6",
    "input": "Write a haiku about code",
    "stream": true
  }'

# Response (Server-Sent Events):
# event: response.created
# data: {"type":"response.created","response":{"id":"resp_abc",...}}
#
# event: response.output_item.added
# data: {"type":"response.output_item.added",...}
#
# event: response.content_part.added
# data: {"type":"response.content_part.added",...}
#
# event: response.output_text.delta
# data: {"type":"response.output_text.delta","delta":"Gentle "}
#
# event: response.output_text.delta
# data: {"type":"response.output_text.delta","delta":"keystrokes"}
#
# event: response.output_text.done
# data: {"type":"response.output_text.done","text":"..."}
#
# event: response.completed
# data: {"type":"response.completed","response":{...,"usage":{...}}}

ℹStream interruption

If the upstream LLM connection drops mid-stream, tokens already emitted are billed normally. Distinguish "completed" from "interrupted" on the client side by checking whether you received the final response.completed event.

Pre-flight Checks

The Responses API runs the same pre-flight and billing pipeline as Chat Completions. Before calling the upstream LLM, the gateway checks in order:

1.Model validation: Model exists, provider config enabled, scope includes inference.

2.Rate limit check:User override → platform default → pass-through. Returns 429 if exceeded.

3.Suspension check: If the user's budget has is_suspended=true, returns 402 with code budget_suspended.

4.Budget check: If the user has a budget, remaining_usd must be > 0. Returns 402 with code budget_exhausted.

5.Wallet check: Platform wallet must cover estimated cost. Returns 402 with code wallet_insufficient.

⚠Parse error.code, not just the status

All three payment failures return 402 Payment Required but require different UX. Parse error.code to show the right message. See Chat Completions pre-flight for the full error code table.

Using with OpenAI SDK

Point the base URL at Assistiv and use the end-user key as the API key. Always pass a slug from GET /v1/models, not a hardcoded string.

python

from openai import OpenAI

client = OpenAI(
    api_key="sk-eu_your_end_user_key",
    base_url="https://api.assistiv.ai/v1",
)

# Simple string input
response = client.responses.create(
    model="gpt-5.6",
    input="What is the capital of France?",
)
print(response.output[0].content[0].text)

typescript

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "sk-eu_your_end_user_key",
  baseURL: "https://api.assistiv.ai/v1",
});

const response = await client.responses.create({
  model: "gpt-5.6",
  input: "What is the capital of France?",
});
console.log(response.output[0].content[0].text);