Costs
Check session usage, inspect remaining quota, and keep token spend in check.
What it is
CrabCode bills by token usage. Every model call is metered through the acosmi gateway — there is no direct billing relationship with upstream providers:
- Subscribers — debit against your entitlement bucket; each model has its own remaining quota
- Prepaid-balance users — debit against your balance at the active model's per-token price
Each session accumulates token counts and an estimated USD spend locally; when that estimate crosses a built-in threshold, a "cost threshold" dialog pops up once.
When you see this doc
- The "Learn more" link at the bottom of the cost-threshold dialog (fires once per session)
- The "Read more" link from
/cost
Current session spend
/cost/costSubscribers see the current quota status (allowed / running low / exhausted).
Other users see a session breakdown:
- Input / output / cache-read / cache-write token counts, broken down per model
- Cumulative estimated USD (priced at each model's current rate)
- Total API duration and wall-clock duration
- Total lines added / removed
The numbers come from a client-side accumulator and may drift slightly from the real bill on acosmi.com — the dashboard is authoritative.
Per-model remaining quota
In the model picker (/model), each model shows its remaining %. This is the gateway-aggregated "remaining / quota for this model's bucket." If you see "quota insufficient" for a model, switch to another model or top up / upgrade your plan at acosmi.com.
Cost-threshold reminder
CrabCode pops the cost dialog once when the estimated spend crosses a built-in threshold, nudging you to review your spend pace. The threshold is fixed and fires at most once per session; it isn't configurable in settings.json.
Acknowledge the dialog to keep going. For longer-term savings, use the tactics below.
Saving money
| Tactic | Why |
|---|---|
/clear to drop unrelated context | Long context means more tokens on every request |
Use /model to switch to a smaller / cheaper model for routine work | Lower per-token price |
| Split tasks: exploratory Q&A on a small model, key edits on a larger one | Save expensive tokens for the critical path |
| Lean on prompt caching (gateway-enabled by default) | Highly repetitive prompts save on read tokens |
| Use subagents for bulk reading work | Keeps the main transcript free of tool-output noise |
Limits and caveats
- Local estimate —
/costis a client-side back-calculation at the model's rate and may differ from the acosmi.com bill; the dashboard is authoritative - MCP / WebFetch tokens count toward the session total
- Subagent spend rolls up to the parent session
- Single billing entry — China region debits the acosmi.com balance, Global debits acosmi.ai (see providers/routing)
- Gateway-counted tokens are the truth — local numbers are a UI estimate; the gateway's metering is what bills