Billing

How credits, token pricing, and rate limits work - and why "how many requests can I get?" is the wrong question to ask.

Most people show up expecting AI APIs to work like a subscription - pay $X, get Y requests. AI gateways do not work that way. This is the foundation page you should read before anything else.

If you only read one thing on this page, read this:

The two systems are separate

Credits are your prepaid balance. They control how much you can spend. Rate limits control how fast you can spend it. You can run out of credits with rate limits untouched, and you can hit rate limits with a full balance. They are two different ceilings.

Why "how many requests can I get?" has no answer

This is the most common question we get, and it has no fixed answer.

The price of one request depends on:

Which model you call. Flagship models can be 30x more expensive per token than their smaller siblings. Same request, very different bill.
How many input tokens you send. A 200-token prompt and a 200,000-token prompt are not priced the same.
How many output tokens the model writes back. This is decided by the model, not by you. A request that returns a one-word answer costs a tiny fraction of one that returns a 4,000-token essay.
Whether prompt caching kicks in. Cached input tokens are billed at a steep discount.
Whether you used any built-in tools. Web search, web fetch, and code interpreter each have their own per-call price on top of the model tokens.

So instead of "how many requests will $10 give me?", the better question is: "how much will my requests cost?" - and the answer comes from the model's per-token pricing applied to your actual usage.

For the full pricing formula and a worked example, see Pricing model.

Credits vs rate limits, side by side

	Credits	Rate limits
Unit	US dollars	Requests and tokens per window
Question it answers	How much can I spend?	How fast can I spend it?
What happens when you hit it	`402 Payment Required`	`429 Too Many Requests`
Refilled by	Topping up	Time passing
Counted per	Account (or per-key cap)	API key + model
Affects price per request	No - sets the wallet size	No - sets the throughput

You can be blocked by either. A typical pattern:

Hobby project: rarely hits rate limits, eventually runs out of credits.
Bursty production app: has plenty of credits, gets throttled by RPM during traffic spikes.
High-volume batch job: hits TPD long before the balance gets close to zero.

The fix is different in each case: top up for the first, upgrade tier (or spread the burst) for the second, batch across days or request a tier bump for the third.

Where to go next

Pricing model - how a single request is priced, with worked examples, caching discounts, and tool fees.
Credits - your prepaid dollar balance, top-ups, and per-key spending limits.
Rate limits - RPM, RPD, TPM, and TPD ceilings, plus how tiers raise them.
What gets charged - definitive answer to "is this request going to charge me?" - stream disconnects, retries, zero-token responses, and every other edge case.

Why "how many requests can I get?" has no answer

Credits vs rate limits, side by side

Where to go next

FAQ

I bought $10 of credits, how many requests is that?

Does a 402 or 429 response cost me anything?

Why does the same prompt cost different amounts on different days?

Do credits expire?

If I switch from one model to a cheaper one, does my rate limit change?

What happens if a streaming response disconnects halfway?

Does topping up more change the price of a model?