Responses API
OpenAI-compatible Responses endpoint. Supports structured input, streaming, function calling, and hosted MCP tools. Parallel to Chat Completions but matching the OpenAI Responses API shape.
ℹBase URL
All endpoints share the same base URL: https://api.assistiv.ai/v1.
/v1/responsesCreates a model response for the given input. Compatible with the OpenAI Responses API format.
Auth: End-user key (sk-eu_*) or platform key (sk-plat_*).
Request Body
| Name | Type | Required | Description |
|---|---|---|---|
| model | string | Required | Model slug from GET /v1/models (e.g. gpt-4o, claude-3-5-sonnet, gemini-pro). Always discover via the models endpoint; never hardcode. |
| input | string | array | Required | Shorthand string for simple prompts, or a structured array of input items for multi-turn conversations. See Structured Input below. |
| instructions | string | Optional | System prompt. Equivalent to a system message in chat completions. |
| tools | array | Optional | Tool definitions for function calling. Same format as chat completions, plus type: "mcp" for hosted MCP tools. See MCP Tools below. |
| tool_choice | string | object | Optional | Controls tool use: "auto", "none", "required", or a specific tool. |
| previous_response_id | string | Optional | Continue from a prior response. The gateway loads the previous conversation context automatically. |
| thread_id | string | Optional | Agent thread identifier. Groups multiple responses into a single conversation thread. |
| stream | boolean | Optional | If true, returns a stream of Server-Sent Events (SSE).Default: false |
| temperature | number | Optional | Sampling temperature between 0 and 2. |
| top_p | number | Optional | Nucleus sampling parameter. |
| max_output_tokens | integer | Optional | Maximum number of tokens to generate. For GPT-5 family models, the minimum is 16. Values below 16 return 422. |
| response_format | object | Optional | Response format constraint. Use {"type": "json_object"} for JSON mode. |
curl -X POST https://api.assistiv.ai/v1/responses \
-H "Authorization: Bearer sk-eu_your_key" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"input": "What is the capital of France?"
}'{
"id": "resp_abc",
"object": "response",
"created_at": 1744113000,
"status": "completed",
"model": "gpt-4o",
"output": [
{
"type": "message",
"id": "msg_abc",
"role": "assistant",
"content": [
{ "type": "output_text", "text": "The capital of France is Paris."}
]
}
],
"usage": {
"input_tokens": 15,
"output_tokens": 30,
"total_tokens": 45
}
}Structured Input
The input field accepts either a plain string (shorthand for a single user message) or an array of input items for multi-turn conversations.
Each item in the array has a role (user, assistant, system) and content. Use the instructions field for the system prompt instead of including a system role in the input array.
{
"model": "gpt-4o",
"instructions": "You are a helpful assistant.",
"input": [
{
"role": "user",
"content": "What is the capital of France?"},
{
"role": "assistant",
"content": "The capital of France is Paris."},
{
"role": "user",
"content": "What about Germany?"}
]
}Hosted MCP Tools
Pass hosted MCP tools with type: "mcp" in the tools array. The gateway connects to the MCP server, discovers available tools, and the model can invoke them during the response.
For full details on OAuth setup, available apps, and tool configuration, see the MCP / Tools page.
{
"model": "gpt-4o",
"input": "Create a GitHub issue titled 'Bug report'",
"tools": [
{
"type": "mcp",
"server_label": "github",
"server_url": "https://mcp.assistiv.ai/mcp",
"require_approval": "never",
"allowed_tools": ["create_issue", "list_issues"]
}
]
}⚠MCP tools are Responses-only
Hosted MCP tools (type: "mcp") are only supported on /v1/responses. Sending type: "mcp" to /v1/chat/completions returns 422 Unprocessable Entity.
ℹmax_output_tokens minimum
For GPT-5 family models, max_output_tokens must be at least 16 per OpenAI's contract. Values below 16 return 422 Unprocessable Entity.
Function Calling
Pass function tool definitions in the tools array with type: "function". When the model decides to call a function, the response contains an output item with type: "function_call".
Execute the function on your side, then send the result back using previous_response_id and an input item with type: "function_call_output". Continue the loop until the model returns a text message.
{
"model": "gpt-4o",
"input": "What's the weather in London?",
"tools": [
{
"type": "function",
"name": "get_weather",
"description": "Get current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City name"}
},
"required": ["location"]
}
}
]
}// Response with function_call output:
{
"id": "resp_abc",
"object": "response",
"status": "completed",
"output": [
{
"type": "function_call",
"id": "fc_abc",
"call_id": "call_abc",
"name": "get_weather",
"arguments": "{"location":"London"}"}
]
}
// Send the result back using previous_response_id:
{
"model": "gpt-4o",
"previous_response_id": "resp_abc",
"input": [
{
"type": "function_call_output",
"call_id": "call_abc",
"output": "15°C, cloudy"}
]
}Streaming
Set stream: true to receive a stream of Server-Sent Events. Unlike chat completions, the Responses API uses typed event names.
Streaming Events
response.createdResponse object createdresponse.output_item.addedNew output item startedresponse.content_part.addedContent part started within an output itemresponse.output_text.deltaIncremental text chunkresponse.output_text.doneText output completeresponse.content_part.doneContent part finishedresponse.output_item.doneOutput item finishedresponse.completedFull response complete (includes usage)curl -X POST https://api.assistiv.ai/v1/responses \
-H "Authorization: Bearer sk-eu_your_key" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"input": "Write a haiku about code",
"stream": true
}'
# Response (Server-Sent Events):
# event: response.created
# data: {"type":"response.created","response":{"id":"resp_abc",...}}
#
# event: response.output_item.added
# data: {"type":"response.output_item.added",...}
#
# event: response.content_part.added
# data: {"type":"response.content_part.added",...}
#
# event: response.output_text.delta
# data: {"type":"response.output_text.delta","delta":"Gentle "}
#
# event: response.output_text.delta
# data: {"type":"response.output_text.delta","delta":"keystrokes"}
#
# event: response.output_text.done
# data: {"type":"response.output_text.done","text":"..."}
#
# event: response.completed
# data: {"type":"response.completed","response":{...,"usage":{...}}}ℹStream interruption
If the upstream LLM connection drops mid-stream, tokens already emitted are billed normally. Distinguish "completed" from "interrupted" on the client side by checking whether you received the final response.completed event.
Pre-flight Checks
The Responses API runs the same pre-flight and billing pipeline as Chat Completions. Before calling the upstream LLM, the gateway checks in order:
inference.429 if exceeded.is_suspended=true, returns 402 with code budget_suspended.remaining_usd must be > 0. Returns 402 with code budget_exhausted.402 with code wallet_insufficient.⚠Parse error.code, not just the status
All three payment failures return 402 Payment Required but require different UX. Parse error.code to show the right message. See Chat Completions pre-flight for the full error code table.
Using with OpenAI SDK
Point the base URL at Assistiv and use the end-user key as the API key. Always pass a slug from GET /v1/models, not a hardcoded string.
from openai import OpenAI
client = OpenAI(
api_key="sk-eu_your_end_user_key",
base_url="https://api.assistiv.ai/v1",
)
# Simple string input
response = client.responses.create(
model="gpt-4o",
input="What is the capital of France?",
)
print(response.output[0].content[0].text)import OpenAI from "openai";
const client = new OpenAI({
apiKey: "sk-eu_your_end_user_key",
baseURL: "https://api.assistiv.ai/v1",
});
const response = await client.responses.create({
model: "gpt-4o",
input: "What is the capital of France?",
});
console.log(response.output[0].content[0].text);