Models

Browse the model catalogue and learn how to pick the right one for your workload.

The Models page in the Console is the catalogue of everything you can call through the gateway - every text, image, embeddings, speech, transcription, and video model, with live pricing and the exact capabilities each one supports. This page explains how to read it so you can pick the right model in a few seconds instead of guessing from a name.

Start with Processed

The fastest way to pick a model is to look at the Processed column. It is the real-world popularity signal - the more tokens a model has processed on the gateway, the more other builders are trusting it for production traffic. When in doubt, sort by Most Popular and start from the top.

Processed: the popularity signal

Processed is the single most useful column for choosing a model, which is why it is worth understanding first. It shows the total volume that model has pushed through the gateway:

  • Text, embeddings - tokens processed (in + out), shown as 303.7M, 1.2B, etc.
  • Image - number of images generated.
  • Transcription - audio hours processed.
  • Video - video seconds generated.

A high number means the model is being used heavily in real workloads. That is a much better signal than a marketing name or a release date: it tells you what the community actually keeps coming back to. A brand-new model with low volume is not necessarily worse - it just has not been battle-tested on the platform yet.

Use it like this:

  1. Pick the category tab for the task (Text, Image, etc.).
  2. Leave the sort on Most Popular (the default).
  3. Scan the top rows - those are the proven workhorses.
  4. Compare their Input / Output price to find the cheapest one that still has the Capabilities you need.

Categories (tabs)

The tabs along the top split the catalogue by modality. The number badge on each tab is how many models are available in that category:

TabWhat lives here
TextChat & completion models (the bulk of the catalogue)
ImageImage-generation models
EmbeddingsVector / embedding models
TranscriptionSpeech-to-text models
SpeechText-to-speech models
VideoVideo-generation models

Each category shows a different set of columns because the billing unit differs - an image model has Sizes and Quality columns instead of Context and Max Output, for example.

Search, filter, and sort

Right under the tabs there are three controls:

  • Search - matches on both the display name (Gemini 2.5 Flash) and the model ID (gemini-2.5-flash). Paste either one.
  • All providers - narrow the list to a single provider (Google, DeepSeek, MiniMax, etc.).
  • Sort by - reorders the table:
Sort optionOrders byUse it when
Most Popular (default)Processed volume, high to lowYou want a proven, widely-used model
NewestRelease date, newest firstYou want the latest model a provider shipped
CheapestCombined input + output price, low to highCost is your primary constraint
ProviderKeeps provider groupingYou are comparing models from one vendor

A common flow is: filter to one provider, sort by Cheapest, then check the Processed column to avoid the cheap-but-unused outliers.

Reading the columns

For Text models the table shows, left to right:

ColumnWhat it tells you
ModelProvider icon, display name, and the copyable model ID. Click the ID to copy it for your API call. A discount badge like -25% appears when the model is on sale.
ContextMax input window (tokens the model can read in one request).
Max OutputMax tokens the model can generate in a single response.
ModalityInput (IN) and output (OUT) types - text, image, audio, video - shown as icons.
InputPrice per million input tokens (MTok).
OutputPrice per million output tokens (MTok).
CacheDiscounted price for cached (repeat) input, when the model supports prompt caching.
CapabilitiesIcons for the features the model supports (see below).
ProcessedUsage volume - the popularity signal described above.

Copy the model ID, not the name

The value you pass as model in an API request is the ID in the grey pill (e.g. gemini-2.5-flash), not the bold display name. Click the pill to copy it.

The Modality column

IN lists what the model can accept, OUT lists what it can return. Each icon has a tooltip - hover to see whether it means Text, Image, Audio, or Video. A model with image and audio icons under IN is multimodal and can take those as input; embeddings models hide the OUT row because they only return vectors.

The Capabilities column

These icons are a quick map of what the model can do. Hover any icon for its label:

IconCapabilityDescription
Function callingSupports tools / tool calling
Structured outputsCan return JSON / a fixed schema
Web searchSupports built-in web search
ReasoningHas an extended thinking / reasoning mode

If a model is missing the capability your app needs (for example, no icon when you rely on tool calling), skip it - the feature is not available no matter how cheap or popular it is.

Pricing

All token prices are quoted per million tokens (MTok). Models split into two prices, Input and Output, because output almost always costs more than input. For a full breakdown of how a request is billed, see How pricing works.

Discounts

When a model is discounted, the Console shows the original price struck through with the discounted price in green next to it, and a percentage badge (-25%) on the Model name. The green price is what you actually pay.

Tiered (volume) pricing

Some models price differently depending on how big a single request is - for example, prompts under 200K tokens are cheaper per token than prompts above it. When a model uses tiered pricing, its price cell has a dashed underline. Hover it and a small table appears showing the price for each context tier:

ContextInputOutput
≤200K$1.25/MTok$10.00/MTok
>200K$2.50/MTok$15.00/MTok

The number shown in the table cell itself is always the lowest tier (the best-case price); hover to see what larger requests cost. The same dashed- underline + hover pattern is used for image models that price per size/quality and video models that price per resolution.

Other categories at a glance

The non-text tabs swap in columns that match how that modality is billed:

  • Image - Sizes and Quality list the supported outputs; Output is priced per image (hover the dashed price to see per-size pricing).
  • Embeddings - shows Context and a single Input price; there is no output price because embeddings only return vectors.
  • Transcription - Input is priced per audio hour.
  • Speech - Output is priced per second of generated audio.
  • Video - Aspect Ratio and Resolution list supported outputs; Output is priced per second.

Tooltips everywhere

Anything with a dashed underline or an icon has a tooltip. Hover modality icons for their type, capability icons for their feature name, and any dashed price for the full tiered / per-variant breakdown.

FAQ