What gets charged

Exactly when a request is billed and when it is not - stream disconnects, client cancels, retries, zero-token responses, and every other edge case, matched to how the gateway actually works.

This page is the definitive answer to "is this request going to charge me?". Every scenario below matches how the gateway actually decides whether to deduct from your balance.

The one-line rule

You are charged when, and only when, the upstream provider reports usage back to us. No usage reported, no charge.

Everything else on this page is a corollary of that rule.

How the gateway computes a charge

For every successful response, the gateway:

Reads the usage object from the response (or, for streams, from the final stream chunks).
Multiplies the reported token counts by the model's per-unit price to get a model cost.
Adds any accumulated tool cost (built-in web search, web fetch, code interpreter calls that ran during this request).
Subtracts the combined model_cost + tool_cost from your balance in a single atomic operation.
Logs the breakdown to your Usage and Logs pages.

If usage is missing or both counts are zero, no deduction happens, the request is logged as a no-usage warning, and the model is flagged internally so we can investigate.

Scenario-by-scenario reference

Non-streaming responses

Streaming responses

Streaming has more failure modes than non-streaming, and the billing behaviour is not always the obvious one. Read this section carefully - especially the client-cancel case.

Cancelling does not stop generation

HTTP cancellation only closes the channel between you and the gateway. The upstream provider has no idea you cancelled and keeps running. Treat every streaming request as a request to generate the full response - if you only want N tokens, set max_completion_tokens to N rather than relying on client-side cancel to save money.

Pre-flight rejections

These never reach the provider, so they never produce any usage to bill against.

Retries

The gateway has an internal retry loop for transient upstream failures (network glitches, 5xx responses, certain rate-limit conditions on the provider side).

Tools and tool calls

Cached input tokens

Summary table

Scenario	Charged?
200 response with usage	Yes - full cost
200 response, no usage / zero tokens	No
4xx or 5xx from provider	No
Stream completes normally	Yes - full cost
Stream errors out mid-flight (upstream drop, 5xx)	No
First-chunk or chunk-stall timeout	No
Client cancels mid-stream	Yes - full upstream cost
401 / 402 / 429 / 400 from gateway	No
Gateway retry that eventually succeeds	Yes - once, for the success
Gateway retry that ultimately fails	No
Defined tool that never gets called	No tool fee
Built-in tool call succeeds	Yes - tool fee added
Built-in tool call fails	No tool fee

How to verify any specific request

Every request the gateway processes shows up in the Console under Logs with:

The full token breakdown (input, output, cached, cache write, tool cost).
The exact dollar amount deducted ($0.00 if not charged).
The final HTTP status returned to you.

If you ever see a charge you cannot account for, find the request in Logs and the breakdown will show you exactly where the cost came from.

The one-line rule

How the gateway computes a charge

Scenario-by-scenario reference

Non-streaming responses

Successful response (HTTP 200 with usage)

Provider returns 200 but no usage object

Provider returns 200 with zero tokens reported

Provider returns 4xx or 5xx error

Streaming responses

Stream completes normally

Upstream provider disconnects mid-stream (network drop, 5xx)

You (the client) close the connection mid-stream

First chunk never arrives (upstream hung)

Stream stalls between chunks for too long

Pre-flight rejections

Invalid or revoked API key (401)

Insufficient balance (402)

Per-key spend limit reached (402)

Rate limit exceeded (429)

Validation error in request body (400)

Retries

Gateway-internal retry that eventually succeeds

Gateway-internal retry that ultimately fails

Your own client-side retry after a 429 or 5xx

Tools and tool calls

You define a tool but the model never calls it

Model calls a built-in tool (web search, web fetch, code interpreter)

Model calls a built-in tool multiple times in one request

Built-in tool execution fails (e.g. web fetch times out)

Cached input tokens

Provider reports cached input tokens in the response

Provider reports cache write tokens

Summary table

How to verify any specific request