Credits

How your prepaid dollar balance works, top-ups, per-key spending limits, and what happens when you run out.

Credits on Aivene are a prepaid dollar balance, not a request quota. You top up $20, you have $20 sitting in your account. Each successful request subtracts its computed cost from that balance.

There is no monthly fee, no subscription, and no commitment.

How charges work

Before any request reaches the provider, the gateway checks your balance. If it is above the minimum threshold, the request goes through. After the response comes back, the cost is computed from the usage field and deducted from your balance.

balance_before = $10.00
request cost   = $0.0135
balance_after  = $9.9865

The deduction is atomic and happens once per request. There is no daily batch and no surprise reconciliation later.

When the balance hits zero

When your balance reaches the minimum threshold (effectively zero), every new request returns:

HTTP/1.1 402 Payment Required

{
  "error": {
    "message": "Insufficient credit balance. Please top up your account.",
    "type": "insufficient_balance"
  }
}

The request never reaches the provider, no tokens are generated, and nothing is charged. Top up and requests resume immediately - there is no cooldown or queue to wait through.

No overage, no negative balance

We never let the balance go negative. If a single request would push you past zero we still execute it because we only know the final cost after the provider responds, but the next request after that is blocked. Worst case you end up a few cents below your starting balance.

Credits do not expire

What you top up stays in your account indefinitely. There is no rolling window, no quarterly reset, and no "use it or lose it" clause.

Per-key spending limits

The account balance is the global cap, but each API key can also carry its own independent spending limit. Once a key has spent that amount in its configured period, it stops working - even if the account still has plenty of credits.

FieldMeaning
spendLimitHard ceiling in dollars
spendLimitPerioddaily, weekly, monthly, or total (lifetime)

Examples:

  • A key with $5 / daily stops at $5 spent today, resets at midnight UTC.
  • A key with $100 / total stops permanently once it has spent $100.
  • A key with no limit set inherits only the account-level balance check.

When a per-key limit is reached, the response is the same 402 as a zero-balance account, but the error message tells you it was the key limit, not the account:

HTTP/1.1 402 Payment Required

{
  "error": {
    "message": "API key spend limit reached. Limit: $5.00 per daily. Reset your limit or wait for the next period.",
    "type": "spend_limit_exceeded"
  }
}

When to use per-key limits

  • Sandboxing experiments - put a $5/daily cap on a key you are using to debug, so a runaway loop maxes out at $5 instead of your whole balance.
  • Per-environment caps - production key gets the full balance, staging key gets $20/monthly.
  • Sharing with teammates - hand out a key with a $50/monthly cap without giving them the keys to the vault.

You can adjust or remove a key's limit at any time from the Console.

Topping up

Top-ups go straight into the same dollar balance. There is no separate "prepaid" vs "earned" bucket - one number, one wallet, drained in order.

Sustained usage of the platform also moves you to higher rate-limit tiers automatically. Tiers do not change per-token pricing; they only widen the throughput pipe. See Rate limits for how tiers work.

Refunds and failed requests

If a request fails before the provider produces any usage (auth error, rate limit, upstream disconnect mid-stream, validation error), you are not charged at all - the deduction step never runs. There is nothing to refund because nothing was taken.

There is one important exception: cancelling a streaming request from the client side does NOT save money. The gateway drains the rest of the upstream stream in the background and bills the full cost, because the provider keeps generating after you disconnect. To actually limit spend on a streaming call, use max_completion_tokens. See What gets charged for the full breakdown of every failure mode and exactly when a charge does or does not happen.