What gets charged
Exactly when a request is billed and when it is not - stream disconnects, client cancels, retries, zero-token responses, and every other edge case, matched to how the gateway actually works.
This page is the definitive answer to "is this request going to charge me?". Every scenario below matches how the gateway actually decides whether to deduct from your balance.
The one-line rule
You are charged when, and only when, the upstream provider reports usage back to us. No usage reported, no charge.
Everything else on this page is a corollary of that rule.
How the gateway computes a charge
For every successful response, the gateway:
- Reads the
usageobject from the response (or, for streams, from the final stream chunks). - Multiplies the reported token counts by the model's per-unit price to get a model cost.
- Adds any accumulated tool cost (built-in web search, web fetch, code interpreter calls that ran during this request).
- Subtracts the combined
model_cost + tool_costfrom your balance in a single atomic operation. - Logs the breakdown to your Usage and Logs pages.
If usage is missing or both counts are zero, no deduction happens,
the request is logged as a no-usage warning, and the model is flagged
internally so we can investigate.
Scenario-by-scenario reference
Non-streaming responses
Streaming responses
Streaming has more failure modes than non-streaming, and the billing behaviour is not always the obvious one. Read this section carefully - especially the client-cancel case.
Cancelling does not stop generation
HTTP cancellation only closes the channel between you and the gateway.
The upstream provider has no idea you cancelled and keeps running.
Treat every streaming request as a request to generate the full
response - if you only want N tokens, set max_completion_tokens to
N rather than relying on client-side cancel to save money.
Pre-flight rejections
These never reach the provider, so they never produce any usage to bill against.
Retries
The gateway has an internal retry loop for transient upstream failures (network glitches, 5xx responses, certain rate-limit conditions on the provider side).
Tools and tool calls
Cached input tokens
Summary table
| Scenario | Charged? |
|---|---|
| 200 response with usage | Yes - full cost |
| 200 response, no usage / zero tokens | No |
| 4xx or 5xx from provider | No |
| Stream completes normally | Yes - full cost |
| Stream errors out mid-flight (upstream drop, 5xx) | No |
| First-chunk or chunk-stall timeout | No |
| Client cancels mid-stream | Yes - full upstream cost |
| 401 / 402 / 429 / 400 from gateway | No |
| Gateway retry that eventually succeeds | Yes - once, for the success |
| Gateway retry that ultimately fails | No |
| Defined tool that never gets called | No tool fee |
| Built-in tool call succeeds | Yes - tool fee added |
| Built-in tool call fails | No tool fee |
How to verify any specific request
Every request the gateway processes shows up in the Console under Logs with:
- The full token breakdown (input, output, cached, cache write, tool cost).
- The exact dollar amount deducted (
$0.00if not charged). - The final HTTP status returned to you.
If you ever see a charge you cannot account for, find the request in Logs and the breakdown will show you exactly where the cost came from.