Models

Browse the model catalogue and learn how to pick the right one for your workload.

The Models page in the Console is the catalogue of everything you can call through the gateway - every text, image, embeddings, speech, transcription, and video model, with live pricing and the exact capabilities each one supports. This page explains how to read it so you can pick the right model in a few seconds instead of guessing from a name.

Start with Processed

The fastest way to pick a model is to look at the Processed column. It is the real-world popularity signal - the more tokens a model has processed on the gateway, the more other builders are trusting it for production traffic. When in doubt, sort by Most Popular and start from the top.

Processed: the popularity signal

Processed is the single most useful column for choosing a model, which is why it is worth understanding first. It shows the total volume that model has pushed through the gateway:

Text, embeddings - tokens processed (in + out), shown as 303.7M, 1.2B, etc.
Image - number of images generated.
Transcription - audio hours processed.
Video - video seconds generated.

A high number means the model is being used heavily in real workloads. That is a much better signal than a marketing name or a release date: it tells you what the community actually keeps coming back to. A brand-new model with low volume is not necessarily worse - it just has not been battle-tested on the platform yet.

Use it like this:

Pick the category tab for the task (Text, Image, etc.).
Leave the sort on Most Popular (the default).
Scan the top rows - those are the proven workhorses.
Compare their Input / Output price to find the cheapest one that still has the Capabilities you need.

Categories (tabs)

The tabs along the top split the catalogue by modality. The number badge on each tab is how many models are available in that category:

Tab	What lives here
Text	Chat & completion models (the bulk of the catalogue)
Image	Image-generation models
Embeddings	Vector / embedding models
Transcription	Speech-to-text models
Speech	Text-to-speech models
Video	Video-generation models

Each category shows a different set of columns because the billing unit differs - an image model has Sizes and Quality columns instead of Context and Max Output, for example.

Search, filter, and sort

Right under the tabs there are three controls:

Search - matches on both the display name (Gemini 2.5 Flash) and the model ID (gemini-2.5-flash). Paste either one.
All providers - narrow the list to a single provider (Google, DeepSeek, MiniMax, etc.).
Sort by - reorders the table:

Sort option	Orders by	Use it when
Most Popular (default)	Processed volume, high to low	You want a proven, widely-used model
Newest	Release date, newest first	You want the latest model a provider shipped
Cheapest	Combined input + output price, low to high	Cost is your primary constraint
Provider	Keeps provider grouping	You are comparing models from one vendor

A common flow is: filter to one provider, sort by Cheapest, then check the Processed column to avoid the cheap-but-unused outliers.

Reading the columns

For Text models the table shows, left to right:

Column	What it tells you
Model	Provider icon, display name, and the copyable model ID. Click the ID to copy it for your API call. A discount badge like `-25%` appears when the model is on sale.
Context	Max input window (tokens the model can read in one request).
Max Output	Max tokens the model can generate in a single response.
Modality	Input (`IN`) and output (`OUT`) types - text, image, audio, video - shown as icons.
Input	Price per million input tokens (MTok).
Output	Price per million output tokens (MTok).
Cache	Discounted price for cached (repeat) input, when the model supports prompt caching.
Capabilities	Icons for the features the model supports (see below).
Processed	Usage volume - the popularity signal described above.

Copy the model ID, not the name

The value you pass as model in an API request is the ID in the grey pill (e.g. gemini-2.5-flash), not the bold display name. Click the pill to copy it.

The Modality column

IN lists what the model can accept, OUT lists what it can return. Each icon has a tooltip - hover to see whether it means Text, Image, Audio, or Video. A model with image and audio icons under IN is multimodal and can take those as input; embeddings models hide the OUT row because they only return vectors.

The Capabilities column

These icons are a quick map of what the model can do. Hover any icon for its label:

Icon	Capability	Description
	Function calling	Supports `tools` / tool calling
	Structured outputs	Can return JSON / a fixed schema
	Web search	Supports built-in web search
	Reasoning	Has an extended thinking / reasoning mode

If a model is missing the capability your app needs (for example, no icon when you rely on tool calling), skip it - the feature is not available no matter how cheap or popular it is.

Pricing

All token prices are quoted per million tokens (MTok). Models split into two prices, Input and Output, because output almost always costs more than input. For a full breakdown of how a request is billed, see How pricing works.

Discounts

When a model is discounted, the Console shows the original price struck through with the discounted price in green next to it, and a percentage badge (-25%) on the Model name. The green price is what you actually pay.

Tiered (volume) pricing

Some models price differently depending on how big a single request is - for example, prompts under 200K tokens are cheaper per token than prompts above it. When a model uses tiered pricing, its price cell has a dashed underline. Hover it and a small table appears showing the price for each context tier:

Context	Input	Output
≤200K	$1.25/MTok	$10.00/MTok
>200K	$2.50/MTok	$15.00/MTok

The number shown in the table cell itself is always the lowest tier (the best-case price); hover to see what larger requests cost. The same dashed- underline + hover pattern is used for image models that price per size/quality and video models that price per resolution.

Other categories at a glance

The non-text tabs swap in columns that match how that modality is billed:

Image - Sizes and Quality list the supported outputs; Output is priced per image (hover the dashed price to see per-size pricing).
Embeddings - shows Context and a single Input price; there is no output price because embeddings only return vectors.
Transcription - Input is priced per audio hour.
Speech - Output is priced per second of generated audio.
Video - Aspect Ratio and Resolution list supported outputs; Output is priced per second.

Tooltips everywhere

Anything with a dashed underline or an icon has a tooltip. Hover modality icons for their type, capability icons for their feature name, and any dashed price for the full tiered / per-variant breakdown.

Processed: the popularity signal

Categories (tabs)

Search, filter, and sort

Reading the columns

The Modality column

The Capabilities column

Pricing

Discounts

Tiered (volume) pricing

Other categories at a glance

FAQ

What does the Processed number actually count?

Why is a model I expected missing from a tab?

Which value do I pass to the API?

Why does the price have a dashed underline?

How do I find the cheapest capable model?