Billing
How credits, token pricing, and rate limits work - and why "how many requests can I get?" is the wrong question to ask.
Most people show up expecting AI APIs to work like a subscription - pay $X, get Y requests. AI gateways do not work that way. This is the foundation page you should read before anything else.
If you only read one thing on this page, read this:
The two systems are separate
Credits are your prepaid balance. They control how much you can spend. Rate limits control how fast you can spend it. You can run out of credits with rate limits untouched, and you can hit rate limits with a full balance. They are two different ceilings.
Why "how many requests can I get?" has no answer
This is the most common question we get, and it has no fixed answer.
The price of one request depends on:
- Which model you call. Flagship models can be 30x more expensive per token than their smaller siblings. Same request, very different bill.
- How many input tokens you send. A 200-token prompt and a 200,000-token prompt are not priced the same.
- How many output tokens the model writes back. This is decided by the model, not by you. A request that returns a one-word answer costs a tiny fraction of one that returns a 4,000-token essay.
- Whether prompt caching kicks in. Cached input tokens are billed at a steep discount.
- Whether you used any built-in tools. Web search, web fetch, and code interpreter each have their own per-call price on top of the model tokens.
So instead of "how many requests will $10 give me?", the better question is: "how much will my requests cost?" - and the answer comes from the model's per-token pricing applied to your actual usage.
For the full pricing formula and a worked example, see Pricing model.
Credits vs rate limits, side by side
| Credits | Rate limits | |
|---|---|---|
| Unit | US dollars | Requests and tokens per window |
| Question it answers | How much can I spend? | How fast can I spend it? |
| What happens when you hit it | 402 Payment Required | 429 Too Many Requests |
| Refilled by | Topping up | Time passing |
| Counted per | Account (or per-key cap) | API key + model |
| Affects price per request | No - sets the wallet size | No - sets the throughput |
You can be blocked by either. A typical pattern:
- Hobby project: rarely hits rate limits, eventually runs out of credits.
- Bursty production app: has plenty of credits, gets throttled by RPM during traffic spikes.
- High-volume batch job: hits TPD long before the balance gets close to zero.
The fix is different in each case: top up for the first, upgrade tier (or spread the burst) for the second, batch across days or request a tier bump for the third.
Where to go next
- Pricing model - how a single request is priced, with worked examples, caching discounts, and tool fees.
- Credits - your prepaid dollar balance, top-ups, and per-key spending limits.
- Rate limits - RPM, RPD, TPM, and TPD ceilings, plus how tiers raise them.
- What gets charged - definitive answer to "is this request going to charge me?" - stream disconnects, retries, zero-token responses, and every other edge case.