Costs

Check session usage, inspect remaining quota, and keep token spend in check.

What it is

CrabCode bills by token usage. Every model call is metered through the acosmi gateway — there is no direct billing relationship with upstream providers:

Subscribers — debit against your entitlement bucket; each model has its own remaining quota
Prepaid-balance users — debit against your balance at the active model's per-token price

Each session accumulates token counts and an estimated USD spend locally; when that estimate crosses a built-in threshold, a "cost threshold" dialog pops up once.

When you see this doc

The "Learn more" link at the bottom of the cost-threshold dialog (fires once per session)
The "Read more" link from /cost

Current session spend

shell

/cost
/cost

Subscribers see the current quota status (allowed / running low / exhausted).

Other users see a session breakdown:

Input / output / cache-read / cache-write token counts, broken down per model
Cumulative estimated USD (priced at each model's current rate)
Total API duration and wall-clock duration
Total lines added / removed

The numbers come from a client-side accumulator and may drift slightly from the real bill on acosmi.com — the dashboard is authoritative.

In the model picker (/model), each model shows its remaining %. This is the gateway-aggregated "remaining / quota for this model's bucket." If you see "quota insufficient" for a model, switch to another model or top up / upgrade your plan at acosmi.com.

Cost-threshold reminder

CrabCode pops the cost dialog once when the estimated spend crosses a built-in threshold, nudging you to review your spend pace. The threshold is fixed and fires at most once per session; it isn't configurable in settings.json.

Acknowledge the dialog to keep going. For longer-term savings, use the tactics below.

Saving money

Tactic	Why
`/clear` to drop unrelated context	Long context means more tokens on every request
Use `/model` to switch to a smaller / cheaper model for routine work	Lower per-token price
Split tasks: exploratory Q&A on a small model, key edits on a larger one	Save expensive tokens for the critical path
Lean on prompt caching (gateway-enabled by default)	Highly repetitive prompts save on read tokens
Use subagents for bulk reading work	Keeps the main transcript free of tool-output noise

Limits and caveats

Local estimate — /cost is a client-side back-calculation at the model's rate and may differ from the acosmi.com bill; the dashboard is authoritative
MCP / WebFetch tokens count toward the session total
Subagent spend rolls up to the parent session
Single billing entry — China region debits the acosmi.com balance, Global debits acosmi.ai (see providers/routing)
Gateway-counted tokens are the truth — local numbers are a UI estimate; the gateway's metering is what bills