# Step 3: Budgets & Rate Limits Control per-user spending and request rates. Budgets cap real AI spend in USD; rate-limit overrides tune RPM/TPM/RPD per user. This step covers plan changes, topups, manual debits (chargebacks/refunds), suspension, the full transaction ledger, and the display balance model. All management endpoints require a **platform key** (`sk-plat_*`). End-user keys (`sk-eu_*`) are hard-blocked from mutating budgets via a scope guard — even against their own user — so browser-side code must never call the management routes. End users query their own balance via `GET /v1/me/budget` (see "What end users see" below). **Full ledger: every budget write is recorded.** Topups, manual debits, inference spend, suspensions, and opening balances all land in `end_user_budget_transactions`. Query via `GET /budget/transactions` for audit trails and reconciliation. See the API Reference below. --- ## Before you implement — understand the platform ### 1. Find where plan changes happen Search the codebase for upgrade/downgrade flows. Look for: - Stripe webhook handlers (`customer.subscription.updated`, `invoice.payment_succeeded`) - Plan change API routes (`POST /upgrade`, `PATCH /subscription`) - Admin actions that change a user's tier - Trial expiration logic **Ask the developer:** "Where in your codebase does a user's plan change? I need to update their Assistiv budget and rate limits at that point." ### 2. Understand the display model (dual ledger + rules engine) End users have TWO ledgers on the same row, decoupled by design: ``` USD ledger (cost control — always on, mandatory) max_usd ← real AI spend cap you set used_usd ← debited by actual provider cost per call remaining_usd = max_usd - used_usd → hidden from end users entirely Display ledger (product billing — opt-in) max_display ← admin sets in your platform's unit (credits / messages / etc.) used_display ← debited by RULES + admin adjusts → returned to end users via /v1/me/budget in the unit you configured ``` The display ledger only activates when the platform admin opts in (settings `enabled: true`) AND a `max_display` is initialized for the end user (via `POST /wallet`). When either gate fails, `GET /v1/me/budget` returns 404. **Rules engine.** Configure on the dashboard or via PATCH `/v1/platforms/{pid}` with `settings.end_user_wallet.rules`: ```jsonc { "enabled": true, "unit": "credits", "rules": [ { "trigger": "inference_call", "amount": 1 }, // 1 credit per call { "trigger": "tool_call", "amount": 5 }, // 5 per MCP/skill invocation { "trigger": "usd_spent", "amount_per_usd": 20 } // 20 × actual cost in USD ] } ``` Rules are **additive** — every matching rule contributes to the post-call `used_display` delta. Max 8 rules; no duplicate triggers. The dashboard ships four one-click presets: - **Match real cost** → `[{trigger: "usd_spent", amount_per_usd: N}]` (legacy USD shape) - **Flat per message** → `[{trigger: "inference_call", amount: 1}]` - **Per message + per tool** → both `inference_call` and `tool_call` rules - **Custom** → full rules-array editor The two ledgers run in parallel. Either exhausting fires a 402 on inference. They're allowed to drift (e.g. flat-rate users hit display limits while you still have USD budget). Admin endpoints (`POST /wallet/adjust { delta, reason }`) let you credit / debit / refund the display ledger without touching the USD ledger. Full reference: `/docs/api-reference/end-user-wallet`. ### 3. Decide plan-change policy **Ask the developer:** a) "On upgrade, should the user's existing spend carry over, or do they start fresh?" (Carry over = just PATCH `max_usd` higher. Fresh = DELETE budget + POST new one with `used_usd = 0`.) b) "On downgrade mid-period, if a user already spent more than the new cap, should they be blocked immediately or allowed until the next reset?" c) "Do you need to temporarily pause a user without deleting their account (abuse review, failed payment, manual hold)?" If yes, use the new `is_suspended` flag instead of `is_active` — suspension is reversible, keeps the budget row queryable, and still allows manual refunds/chargeback debits while blocking inference. See `PATCH /budget { is_suspended }` below. --- ## Changing Plans When a user upgrades or downgrades, update their budget and rate limits. Use the same `PLANS` config defined in Step 2. ```typescript async function changePlan(assistivUserId: string, newPlan: string) { const planConfig = PLANS[newPlan]; if (!planConfig) throw new Error(`Unknown plan: ${newPlan}`); // Update budget: DELETE + POST gives a clean reset await plat(`/end-users/${assistivUserId}/budget`, { method: "DELETE" }); await plat(`/end-users/${assistivUserId}/budget`, { method: "POST", body: JSON.stringify({ max_usd: planConfig.backend_budget_usd, period: planConfig.period, auto_replenish: planConfig.period !== "one_time", replenish_amount: planConfig.backend_budget_usd, }), }); // Update rate limits await plat(`/end-users/${assistivUserId}/rate-limits`, { method: "DELETE" }); if (planConfig.rpm_limit || planConfig.rpd_limit) { await plat(`/end-users/${assistivUserId}/rate-limits`, { method: "POST", body: JSON.stringify({ rpm_limit: planConfig.rpm_limit, tpm_limit: planConfig.tpm_limit ?? null, rpd_limit: planConfig.rpd_limit, }), }); } // Update the user's plan metadata await plat(`/end-users/${assistivUserId}`, { method: "PATCH", body: JSON.stringify({ metadata: { plan: newPlan } }), }); } // Helper for platform-key requests const plat = (path: string, init: RequestInit = {}) => fetch(`${API_BASE}/platforms/${PLATFORM_ID}${path}`, { ...init, headers: { Authorization: `Bearer ${PLATFORM_KEY}`, "Content-Type": "application/json", ...(init.headers || {}), }, }); ``` ```python def change_plan(assistiv_user_id: str, new_plan: str): plan_config = PLANS[new_plan] base = f"{API_BASE}/platforms/{PLATFORM_ID}/end-users/{assistiv_user_id}" # Update budget: DELETE + POST gives a clean reset requests.delete(f"{base}/budget", headers=platform_headers) requests.post( f"{base}/budget", headers=platform_headers, json={ "max_usd": plan_config["backend_budget_usd"], "period": plan_config["period"], "auto_replenish": plan_config["period"] != "one_time", "replenish_amount": plan_config["backend_budget_usd"], }, ).raise_for_status() # Update rate limits requests.delete(f"{base}/rate-limits", headers=platform_headers) if plan_config.get("rpm_limit") or plan_config.get("rpd_limit"): requests.post( f"{base}/rate-limits", headers=platform_headers, json={ "rpm_limit": plan_config.get("rpm_limit"), "tpm_limit": plan_config.get("tpm_limit"), "rpd_limit": plan_config.get("rpd_limit"), }, ).raise_for_status() # Update the user's plan metadata requests.patch( f"{API_BASE}/platforms/{PLATFORM_ID}/end-users/{assistiv_user_id}", headers=platform_headers, json={"metadata": {"plan": new_plan}}, ).raise_for_status() ``` ### Caveats on plan changes - **Monthly budget reset is automatic.** `period: "monthly"` with `auto_replenish: true` resets `used_usd` to 0 and `max_usd` to `replenish_amount` on the 1st of each calendar month. No cron needed. - **Mid-month downgrade.** If a user spent $1.50 on Pro (`max_usd=2.00`) and you downgrade to Free (`max_usd=0.50`), DELETE + POST resets `used_usd` to 0, giving them a fresh $0.50 budget. If you want to carry over spend, use PATCH instead of DELETE + POST. - **Rate limit cache invalidation.** POST/DELETE on rate-limit configs invalidates the `rlcfg:*` Redis key server-side. Tightened limits take effect on the next request — no delay. - **Sliding-window counter survives tier change.** If you tighten a user's RPM from 30 to 5, their existing in-window request counts still count against the new limit. They may get 429s until the window rolls. --- ## What End Users See (client-side) End users query their own budget via `GET /v1/me/budget` using their `sk-eu_*` key. The response contains **display values only** — no raw USD amounts are exposed: ```json { "display_balance": 10000, "display_remaining": 8000, "display_unit": "credits", "period": "monthly", "period_start": "2026-04-01T00:00:00Z", "auto_replenish": true, "is_active": true } ``` - `display_balance` — total allocation for the period (max_usd × credits_per_usd) - `display_remaining` — what's left (remaining_usd × credits_per_usd) - `display_unit` — the label configured on the dashboard Your client-side UI reads these fields directly. No conversion math needed. When the user's plan changes, `display_balance` updates automatically because it's derived from the new `max_usd`. Returns 404 if no active budget exists for this user. --- ## API Reference: Budgets ### POST /v1/platforms/{platformId}/end-users/{endUserId}/budget Create a budget. Auth: platform key. Atomically writes an opening-balance ledger row (`type='opening'`) so `GET /budget/transactions` always shows the budget's full history from the moment it was created. Request body: ```json { "max_usd": 2.00, "period": "monthly", "auto_replenish": true, "replenish_amount": 2.00, "low_balance_threshold": 0.10 } ``` Fields: - `max_usd` (number, required, > 0) — Maximum real USD spend in the period. - `period` (string) — `"one_time"` (default), `"monthly"`, or `"daily"`. - `auto_replenish` (boolean, default false) — Reset each period. - `replenish_amount` (number, > 0) — Required if `auto_replenish` is true. - `low_balance_threshold` (number, >= 0) — Threshold for low-balance alerts. Epic 3: this is the trigger value for the edge-triggered `budget.low_balance` webhook event. Set it if you subscribe to that event. See Step 6 for details. Response (201): ```json { "id": "uuid", "platform_id": "uuid", "end_user_id": "uuid", "max_usd": 2.00, "used_usd": 0.00, "period": "monthly", "period_start": "2026-04-01T00:00:00Z", "auto_replenish": true, "replenish_amount": 2.00, "low_balance_threshold": 0.10, "is_active": true, "created_at": "2026-04-08T10:30:00Z", "updated_at": "2026-04-08T10:30:00Z" } ``` Note: the platform-key response includes raw USD fields (`max_usd`, `used_usd`) for admin visibility. End-user-key responses (`GET /v1/me/budget`) return display values instead. ### GET /v1/platforms/{platformId}/end-users/{endUserId}/budget Get a user's budget. Auth: platform key. Response includes a computed `remaining_usd = max_usd - used_usd`. ### PATCH /v1/platforms/{platformId}/end-users/{endUserId}/budget Update a budget. Auth: platform key. All fields optional: ```json { "max_usd": 5.00, "auto_replenish": true, "replenish_amount": 5.00, "low_balance_threshold": 0.50, "is_active": true, "is_suspended": false, "reason": "upgrade_to_pro", "metadata": { "stripe_subscription_id": "sub_..." } } ``` **`is_suspended`** (new) — flip to `true` to temporarily pause inference for this user without losing budget state. While `is_suspended=true`: - Inference calls return `402 budget_suspended` (distinct code from `budget_exhausted`) - `GET /v1/me/budget` still returns the row (not 404) - Topups still land, manual debits still land (you can still record chargebacks) - Dashboard still renders normally Flip to `false` to un-pause. The transition writes a `type='adjustment'` ledger row with `metadata.changed_fields.is_suspended`, and Epic 3 fires `budget.suspended` / `budget.unsuspended` outbound webhooks on each transition (if you've registered an endpoint for them). **`reason`** and **`metadata`** (new in Epic 1) flow through into the ledger row and into the outbound webhook payload. Use them for audit breadcrumbs. **Idempotency-Key header:** PATCH honors `Idempotency-Key` like the topup and debit endpoints. Same key + same body → replay the cached ledger row. Same key + different body → 409 Conflict. ### DELETE /v1/platforms/{platformId}/end-users/{endUserId}/budget Soft-delete the user's budget (flips `is_active` to `false`). Auth: platform key. Writes a `type='adjustment'` ledger row marking the delete so the audit trail survives. The user then spends freely against the platform wallet until you recreate one. Note: for temporary pauses, prefer `PATCH { is_suspended: true }` instead. Suspension is reversible and keeps the budget row queryable; DELETE is for hard removal (account cancellation, GDPR erasure). ### POST /v1/platforms/{platformId}/end-users/{endUserId}/budget/topup Add USD to a user's budget (increases `max_usd`). Auth: platform key. Use for one-time credit grants or add-on purchases. Request body: ```json { "amount_usd": 1.00, "reason": "promo_grant", "metadata": { "promo_code": "WELCOME10" } } ``` **`Idempotency-Key` header (strongly recommended).** Epic 1 makes topup strictly idempotent: ```http POST /v1/platforms/.../budget/topup Authorization: Bearer sk-plat_... Idempotency-Key: stripe-invoice-in_1Mt2P3xyz Content-Type: application/json {"amount_usd": 10.00, "reason": "stripe_invoice", "metadata": {"invoice_id": "in_1Mt..."}} ``` - Same key + same body → returns the cached ledger row, no double-credit - Same key + **different** body → `409 Conflict` with `existing_fingerprint` (Stripe-style strict semantics) - No key → every call applies (legacy behavior; not recommended for production code paths that can retry) Use a key that is **stable across retries of the same logical operation**: - Stripe invoice ID, webhook delivery ID, your internal payment ID — good - `Date.now()`, `uuid()` generated per-retry — bad (every retry re-executes) The cached replay returns an `idempotent_replay: true` field and the original `transaction` row. After topup, the user's `display_balance` increases proportionally (e.g. +1.00 USD × 5000 credits_per_usd = +5,000 display credits). Epic 3 fires `budget.topped_up` on success. ### POST /v1/platforms/{platformId}/end-users/{endUserId}/budget/debit **New in Epic 2.** Manually debit USD from a user's budget (increases `used_usd`). Auth: platform key. Same `Idempotency-Key` contract as topup. Use for: - **Chargeback handling** — record a reversal when Stripe disputes an invoice - **Refunds with a usage component** — when you refund part of a purchase - **Manual write-offs** — admin adjustment for a support case - **Batch reconciliation** — apply external ledger deltas Request body: ```json { "amount_usd": 5.00, "reason": "chargeback_dispute_du_1Mt...", "metadata": { "dispute_id": "du_1Mt...", "chargeback_date": "2026-04-11" } } ``` ```http POST /v1/platforms/.../budget/debit Authorization: Bearer sk-plat_... Idempotency-Key: chargeback-du_1Mt... {"amount_usd": 5.00, "reason": "chargeback", "metadata": {"dispute_id": "du_1Mt..."}} ``` **Negative balances are allowed.** Unlike inference debits (which gate on `remaining_usd > 0`), the manual debit endpoint does NOT refuse when the user has insufficient balance. This is intentional — Assistiv is a real ledger, so you can record a chargeback that pushes a user into debt. The inference path still gates at `remaining_usd <= 0`, so the user cannot *spend* new requests while in debt, but you don't need to maintain a parallel "debt ledger" on your side to record the obligation. **Manual debits bypass `is_suspended`.** You can record chargebacks against a suspended user. Symmetric with topup. Epic 3 fires `budget.debited` on success (opt-in — off by default in the endpoint subscription form because at LLM cadence inference debits also fire this event). If you enable it, expect high volume. Response (200): ```json { "success": true, "idempotent_replay": false, "budget_id": "uuid", "max_usd": 10.00, "used_usd": 12.00, "transaction": { "id": "txn-uuid", "type": "debit", "amount_usd": 5.00, "max_usd_after": 10.00, "used_usd_after": 12.00, "reason": "chargeback_dispute_du_1Mt...", "metadata": { "dispute_id": "du_1Mt..." }, "created_at": "2026-04-11T15:00:00Z" } } ``` ### GET /v1/platforms/{platformId}/end-users/{endUserId}/budget/transactions **New in Epic 1 (extended in Epic 2).** Paginated ledger of every state change to a user's budget. Auth: platform key. Use for: - Audit trails - Per-user monthly statements - Reconciling against your own billing records - Debugging "where did this user's balance come from" questions Query params: - `since` (ISO 8601 timestamp, optional) — return rows strictly after this time - `limit` (integer, 1–200, default 50) Response: ```json { "data": [ { "id": "txn-uuid", "budget_id": "uuid", "type": "opening", "amount_usd": 2.00, "max_usd_before": 0.00, "max_usd_after": 2.00, "used_usd_before": 0.00, "used_usd_after": 0.00, "reason": "budget_created", "metadata": {}, "actor_key_id": "apk_xxx", "actor_type": "platform_key", "created_at": "2026-04-08T10:30:00Z" }, { "id": "txn-uuid-2", "type": "topup", "amount_usd": 1.00, "max_usd_before": 2.00, "max_usd_after": 3.00, "used_usd_before": 0.00, "used_usd_after": 0.00, "reason": "promo_grant", "metadata": { "promo_code": "WELCOME10" }, "created_at": "2026-04-09T12:00:00Z" }, { "id": "txn-uuid-3", "type": "debit", "amount_usd": 0.25, "max_usd_after": 3.00, "used_usd_after": 0.25, "reason": null, "metadata": { "model": "gpt-4o-mini", "input_tokens": 120 }, "actor_type": "end_user_key", "created_at": "2026-04-09T12:05:00Z" } ], "limit": 50 } ``` **Row types:** | `type` | Emitted by | When | |---|---|---| | `opening` | `POST /budget` | Once, on budget creation | | `topup` | `POST /budget/topup` | Platform tops up, promo grant, Stripe invoice | | `debit` | `POST /budget/debit` OR inference | Manual debit (chargeback/refund) OR automatic on each inference call | | `adjustment` | `PATCH /budget`, `DELETE /budget`, period reset | Config changes, suspension flips, daily/monthly reset | Rows include `actor_type` (`platform_key`, `end_user_key`, `supabase_session`, `system`) and `actor_key_id` so you can see exactly who/what made each change. Query with `since=` for incremental reconciliation. ### GET /v1/platforms/{platformId}/budgets List all end-user budgets on the platform. Auth: platform key. Returns the standard paginated shape. ### Period Reset Behavior - `one_time` — Never resets. User is blocked when `used_usd` reaches `max_usd` until you topup or recreate the budget. - `daily` — `used_usd` resets to 0 every 24h from `period_start`. If `auto_replenish`, `max_usd` also resets to `replenish_amount`. - `monthly` — Same as daily but uses calendar month boundaries (`period_start + 1 month`). --- ## Platform Wallet Your platform's pooled credit balance. Every successful inference call debits it atomically. You can read the balance and recent transactions via API to render your own admin UI. ### GET /v1/platforms/{platformId}/wallet Auth: platform key. Returns the wallet row + the 5 most recent transactions. Response: ```json { "id": "wallet-uuid", "platform_id": "platform-uuid", "balance": 24.850000, "currency": "usd", "low_balance_threshold": 1.00, "is_active": true, "created_at": "2026-01-15T12:00:00Z", "updated_at": "2026-04-09T14:22:00Z", "recent_transactions": [ { "id": "txn-uuid", "type": "llm_usage", "amount": 0.000005, "balance_after": 24.850000, "description": "Inference: 19 tokens (gpt-4o-mini)", "created_at": "2026-04-09T14:22:00Z" } ] } ``` `balance` is stored as `numeric(12,6)` — microdollar precision. A chat completion of 20 tokens on a cheap model can move the balance by as little as `0.000002`, so a UI that rounds to 2 decimals will appear stuck. Transaction `type` values: `top_up`, `llm_usage`, `mcp_usage`, `agent_usage`, `refund`, `adjustment`. ### POST /v1/platforms/{platformId}/wallet/topup Auth: platform key. Programmatic top-up. Use for dev/test scenarios; production platforms typically top up via Stripe checkout on the dashboard. Request body: ```json { "amount": 10.00, "description": "Manual dev top-up" } ``` ### POST /v1/platforms/{platformId}/wallet/checkout Auth: platform key. Creates a Stripe Checkout Session. Request body: ```json { "amount": 50.00 } ``` Response: ```json { "url": "https://checkout.stripe.com/c/pay/...", "session_id": "cs_test_..." } ``` The checkout URL is valid for 24 hours. The wallet credit lands on `checkout.session.completed` webhook arrival (usually < 1s after payment). --- ## API Reference: Rate Limits Platforms set default rate limits on the website. Use this API to override them for specific end users (e.g. higher limits for paying tiers). Resolution order at inference time: 1. Explicit override for this end user (if set via this API) 2. Platform default (set on the website) 3. No limit (pass through) ### POST /v1/platforms/{platformId}/end-users/{endUserId}/rate-limits Create a per-user override. Auth: platform key. At least one of `rpm_limit`, `tpm_limit`, or `rpd_limit` is required. Request body: ```json { "rpm_limit": 60, "tpm_limit": 100000, "rpd_limit": 10000 } ``` Fields: - `rpm_limit` (integer, > 0) — Requests per minute. Null = no per-minute limit. - `tpm_limit` (integer, > 0) — Tokens per minute. Null = no TPM limit. - `rpd_limit` (integer, > 0) — Requests per day. Null = no daily limit. Response (201): ```json { "id": "uuid", "platform_id": "uuid", "scope": "end_user", "scope_id": "end-user-uuid", "rpm_limit": 60, "tpm_limit": 100000, "rpd_limit": 10000, "created_at": "2026-04-08T10:30:00Z", "updated_at": "2026-04-08T10:30:00Z" } ``` ### GET /v1/platforms/{platformId}/end-users/{endUserId}/rate-limits Get the explicit override for a user. Auth: platform key. Returns 404 if no override exists (the user falls back to platform default). ### PATCH /v1/platforms/{platformId}/end-users/{endUserId}/rate-limits Update the override. Auth: platform key. Any field can be set to null to remove just that limit while keeping others. ### DELETE /v1/platforms/{platformId}/end-users/{endUserId}/rate-limits Remove the override. Auth: platform key. Returns 204. The user reverts to the platform default. --- Next: [Step 4 — Inference](https://www.assistiv.ai/docs/integration/step-4-inference.txt)