Rate limits
Per-model throughput ceilings - RPM, RPD, TPM, TPD - and how discount tiers raise them. Separate from your credit balance.
Rate limits are a completely separate system from credits. They exist to keep upstream providers happy and to stop a runaway script from melting things. They do not charge you anything and they do not consume credits.
A rate limit answers the question "how many requests or tokens can flow through this key in a given window?". Credits answer "how much can I spend in total?".
The four ceilings
There are four limits, applied per model:
| Limit | Meaning | Reset |
|---|---|---|
| RPM | Requests per minute | every 60 seconds |
| RPD | Requests per day | midnight UTC |
| TPM | Tokens per minute (input + output combined) | every 60 seconds |
| TPD | Tokens per day | midnight UTC |
A request is blocked the moment any of those four ceilings is hit.
When you hit one
The response is:
HTTP/1.1 429 Too Many Requests
{
"error": {
"message": "Rate limit exceeded. Please try again later.",
"type": "rate_limit_exceeded"
}
}The request is not billed, no tokens are generated, and the limiter resets on its own schedule. There is nothing to "reset manually" - just wait for the window to roll over.
How counting works
- RPM and RPD are incremented by 1 for every successful request that reaches the provider.
- TPM and TPD are incremented after the response by the total tokens (input + output) actually used. Cached tokens still count toward TPM/TPD.
- Requests that fail at the gateway (auth error, insufficient balance, rate limit itself) do not increment anything.
Per-model, per-key
Rate limits are scoped to the combination of API key + model. That means:
- The same key calling
gpt-5.4andclaude-sonnet-4.6has independent limits for each model. - A key hitting RPM on
gpt-5.4can still call any other model freely. - Two different keys on the same account each get their own rate-limit bucket, so giving one key to staging and another to production keeps them from starving each other.
Tiers raise the ceiling
Every account belongs to a tier - Free, Tier 1, Tier 2, or Tier 3. Higher tiers have higher RPM, RPD, TPM, and TPD ceilings on every model. Your tier moves up automatically as you use the platform more over time - there is no manual upgrade step.
Tiers do not change pricing
Moving to a higher tier raises your rate-limit ceilings but does not change the per-token price of any model. Tier is about throughput, not cost. The dollar cost of a given request is the same on Tier 1 and Tier 3.
To see your current tier and the exact ceilings on each model, check the Console under Account Settings.
Why this is separate from credits
You can run into either ceiling independently:
- Plenty of credits, hit RPM - typical for bursty production traffic. Fix: spread the burst, add client-side queueing, or upgrade tier.
- Plenty of RPM headroom, ran out of credits - typical for steady background jobs. Fix: top up.
- Hit TPD on a single huge job - typical for batch processing long documents. Fix: spread across days, or top up to a higher tier.
This separation is on purpose. Credits stop you from spending more than you intended; rate limits stop you from overloading a provider or DOSing yourself with a bad loop. Mixing them into a single number would make both jobs harder.
Handling 429 in client code
The OpenAI SDK and most other clients have built-in retry-with-backoff
for 429. Turn it on and most spikes resolve themselves.
const client = new OpenAI({
apiKey: process.env.AIVENE_API_KEY,
baseURL: 'https://api.aivene.com/v1',
maxRetries: 3,
});For long-running batch jobs, prefer a token-aware queue (process N requests in parallel, sleep when 429 fires) over hammering the API and hoping retries pull you through.