What gets charged

Exactly when a request is billed and when it is not - stream disconnects, client cancels, retries, zero-token responses, and every other edge case, matched to how the gateway actually works.

This page is the definitive answer to "is this request going to charge me?". Every scenario below matches how the gateway actually decides whether to deduct from your balance.

The one-line rule

You are charged when, and only when, the upstream provider reports usage back to us. No usage reported, no charge.

Everything else on this page is a corollary of that rule.

How the gateway computes a charge

For every successful response, the gateway:

  1. Reads the usage object from the response (or, for streams, from the final stream chunks).
  2. Multiplies the reported token counts by the model's per-unit price to get a model cost.
  3. Adds any accumulated tool cost (built-in web search, web fetch, code interpreter calls that ran during this request).
  4. Subtracts the combined model_cost + tool_cost from your balance in a single atomic operation.
  5. Logs the breakdown to your Usage and Logs pages.

If usage is missing or both counts are zero, no deduction happens, the request is logged as a no-usage warning, and the model is flagged internally so we can investigate.

Scenario-by-scenario reference

Non-streaming responses

Streaming responses

Streaming has more failure modes than non-streaming, and the billing behaviour is not always the obvious one. Read this section carefully - especially the client-cancel case.

Cancelling does not stop generation

HTTP cancellation only closes the channel between you and the gateway. The upstream provider has no idea you cancelled and keeps running. Treat every streaming request as a request to generate the full response - if you only want N tokens, set max_completion_tokens to N rather than relying on client-side cancel to save money.

Pre-flight rejections

These never reach the provider, so they never produce any usage to bill against.

Retries

The gateway has an internal retry loop for transient upstream failures (network glitches, 5xx responses, certain rate-limit conditions on the provider side).

Tools and tool calls

Cached input tokens

Summary table

ScenarioCharged?
200 response with usageYes - full cost
200 response, no usage / zero tokensNo
4xx or 5xx from providerNo
Stream completes normallyYes - full cost
Stream errors out mid-flight (upstream drop, 5xx)No
First-chunk or chunk-stall timeoutNo
Client cancels mid-streamYes - full upstream cost
401 / 402 / 429 / 400 from gatewayNo
Gateway retry that eventually succeedsYes - once, for the success
Gateway retry that ultimately failsNo
Defined tool that never gets calledNo tool fee
Built-in tool call succeedsYes - tool fee added
Built-in tool call failsNo tool fee

How to verify any specific request

Every request the gateway processes shows up in the Console under Logs with:

  • The full token breakdown (input, output, cached, cache write, tool cost).
  • The exact dollar amount deducted ($0.00 if not charged).
  • The final HTTP status returned to you.

If you ever see a charge you cannot account for, find the request in Logs and the breakdown will show you exactly where the cost came from.