The complete AI backend, built for startups.
Routing, tools, rate limits, billing, and logs — every piece a startup needs to ship an AI product, in one SDK call.
Routing
Every model. Every provider. One SDK that routes both.
- Failover
- BYOK
- Region pin
- Weighted
- Retries
- Health
Six products. One backend.
Each one ships independently. Pin one up, the rest still work. Click any note to deep-dive.
One SDK call → any model from any provider. Same model across deployments, with failover.
We host. We auth your end users. We verify each tool call. You ship the agent.
Your agent doesn't need every skill loaded. It searches at the moment of need, fetches the folder, runs it, leaves a review.
Three gate types: rate, wallet balance, usage-per-time. Set per-platform or per-end-user.
You handle payments. You top up via API. We debit. End users can call Assistiv directly.
Full trace per request. Tokens, latency, cost on every step. Replay + eval built-in.
One call in, one log out. We do the six things in between.
Your app makes one call. We check the user's tier, route to a provider, run any tools, debit the wallet, and write a log row. You handle the feature. We handle the rest.
Everything between
your app and the model.
Each one ships independently. Each one replaces a stack of glue code you'd otherwise maintain. Click any product to deep-dive.
Routing
Every model. Every provider. One SDK that routes both.
Tools
Hosted MCP. We host the servers, handle end-user auth, and verify every tool call.
Skills
Stop curating skill libraries. Agents find the right one at runtime, fetch it, run it, and rate it.
Limits
Per-platform AND per-end-user. Rate, wallet-balance, and usage-per-time gates.
Billing
Per-user wallets you top up via API. Plus client-side calling — end users hit Assistiv directly.
Logs
LangSmith-grade observability. Full traces, token + latency + cost, replay, eval.
Every model. Every provider. One SDK that routes both.
Two routes in one. (1) Every model from every provider — OpenAI, Anthropic, Google, Bedrock, Mistral, Groq — through one OpenAI-compatible call. (2) The same model across provider deployments — gpt-4o on OpenAI ↔ Azure, claude-3.5 on Anthropic ↔ Bedrock — with priority and failover.
- ✓Every model from every provider — one SDK call
- ✓Same model across providers (OpenAI ↔ Azure, Anthropic ↔ Bedrock)
- ✓Priority + weighted routing, sub-200ms failover
- ✓BYOK or Assistiv-managed keys, swap freely
Hosted MCP. We host the servers, handle end-user auth, and verify every tool call.
We host the MCP server. We handle end-user OAuth in your dashboard, under your brand. We verify every tool call before it runs — scopes, allowlists, dry-run when you want it. You write zero OAuth callbacks, store zero refresh tokens, and audit every action.
- ✓We host the MCP servers — 30+ apps live, growing weekly
- ✓We handle end-user auth — OAuth in your dashboard, your brand
- ✓Verified tool calls — scopes, allowlists, dry-run, audit row each call
- ✓Bring-your-own MCP servers supported alongside hosted
Stop curating skill libraries. Agents find the right one at runtime, fetch it, run it, and rate it.
Stuffing 40 skills into the system prompt taxes every token, distracts the model, and rots into a museum nobody uses. Skills inverts this — agents call find_skill at the moment they need a capability, fetch the whole folder, and run it. Reviews compound. The catalog gets sharper without you curating.
- ✓Stop loading skills. Agents call find_skill at the moment they need one
- ✓Real reviews compound — accuracy, value, efficiency, clarity drive ranking
- ✓Plugin-bundle repos with byte-identical copies dedupe to one row
- ✓Private skills are end-user-owned and survive API key rotation
Per-platform AND per-end-user. Rate, wallet-balance, and usage-per-time gates.
Three kinds of gate. Rate limits (RPM, TPM, sliding window, token bucket). Wallet-balance gates (block when below threshold). Usage-per-time budgets ($X per day/week/month). Set them on the platform, on each end-user, or both. Enforced before the provider sees the call.
- ✓Per-platform gates (you set defaults)
- ✓Per-end-user gates (you set per signup)
- ✓Rate limits — RPM, TPM, sliding window, token bucket
- ✓Wallet-balance + usage-per-time budgets
Per-user wallets you top up via API. Plus client-side calling — end users hit Assistiv directly.
Each end-user gets a wallet. You handle payments however you want — Stripe, Paddle, invoicing, internal credits — and call walletTopUp() to credit the wallet. We debit per token, per tool, per request — atomic, audit-logged. End users can also call Assistiv directly with their own sk-eu_ key, bypassing your server entirely.
- ✓You own the payment flow; we own the wallet ledger
- ✓walletTopUp(user, amount) — one API call, idempotent
- ✓Client-side calling — end users hit api.assistiv.ai directly
- ✓Tokens, tools, agents all settle against the same wallet
LangSmith-grade observability. Full traces, token + latency + cost, replay, eval.
Every request becomes a full trace: each LLM call, retry, and tool step in the chain. Tokens, latency, cost, and status attached at every step. Filter by model, user, route, error. Replay any trace. Score traces against datasets. Built-in — not a separate SaaS.
- ✓Full traces — every step, every retry, every tool call
- ✓Tokens, latency, cost on every step (per user, per model, per route)
- ✓Replay traces, score against datasets, run evals
- ✓Webhook export to your warehouse — Snowflake, BigQuery, S3
Fifteen tabs of glue code. Closed.
Every product in the suite replaces something you'd otherwise glue together. Here's the receipt.
- LiteLLM glue code
- Hand-rolled retry wrappers
- Status-page polling
- OAuth swamp
- Token refresh code
- Custom GitHub/Slack/Gmail integrations
- Bloated system prompts
- Hand-curated skill lists
- Stale prompt vendoring
- Sliding-window Redis
- Free-tier abuse triage
- 2 a.m. throttle pages
- DIY wallet tables
- Per-user metering
- Token-counting cron jobs
- LangSmith
- Datadog AI spend reports
- Custom BigQuery cost views
We ship every week.
Small surface area means we can move fast. Here's the last six weeks.
Skills — agent-fetchable code packages
Sixth product in the suite. Agents discover, fetch, run, rate. No more curated skill lists.
Linear, Notion, Jira MCPs
Three new hosted MCP servers. Same OAuth flow, same wallet debits.
Region-pinned routing
Pin requests to US, EU, or AU regions. Per-key, per-user, or per-tier.
Manual wallet adjustments
Issue refunds, holds, and credits via a single endpoint. Auditable.
Token-bucket rate limits
New limit kind alongside sliding-window and fixed-budget. Mix freely.
Webhook log export
Stream one canonical row per request to your warehouse. Real-time.
Free to start.
Usage on top.
Generous free tier across every product. Pay-as-you-go after that — top up the wallet, no monthly base, no "contact sales."
For prototyping. All 6 products on, generous limits across the board.
- 10k monthly active end-users
- 5k traces/month, 7-day retention
- BYOK requests — unlimited, free
- 5k skill calls + 1k MCP tool calls / month, free
For shipping. No monthly base. Top up the wallet, drain on usage.
- Provider price + 5% on tokens
- BYOK requests stay free
- $0.00005 / MAU after 10k
- $0.0025 / trace after 5k/mo
For real volume. Negotiated commission, dedicated support, SOC 2 docs.
- Commission below 5% at volume
- Dedicated routing region
- SOC 2 + DPA on request
- Slack channel with founders
You build the product.
We are the backend.
We onboard 1–2 indie startups a week. Reply within a day, integrated within a week.