Video Understanding
Send videos to a chat model and ask questions about them.
Vision models that support video can accept video clips as part of a user message and reason over them alongside any text. Use this for video summarization, content analysis, action recognition, and visual Q&A on video content.
POST /v1/chat/completionsWhich models support video?
Look for video capability on GET /v1/models. Currently supported Gemini models.
Most other models do not support video input yet.
YouTube URL input
The easiest way to analyze a video is to pass a YouTube URL directly.
curl https://api.aivene.com/v1/chat/completions \
-H "Authorization: Bearer $AIVENE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gemini-2.5-flash",
"messages": [{
"role": "user",
"content": [
{ "type": "text", "text": "Summarize this video in 3 sentences." },
{ "type": "input_video", "input_video": { "url": "https://www.youtube.com/watch?v=VIDEO_ID" } }
]
}]
}'Supported YouTube URL formats:
https://www.youtube.com/watch?v=VIDEO_IDhttps://youtu.be/VIDEO_IDhttps://www.youtube.com/embed/VIDEO_IDhttps://www.youtube.com/shorts/VIDEO_ID
Limits
- Only public YouTube videos are supported
- Max 5 videos per request (inline or URL)
- Max 5 YouTube URLs per request
- Max 100 MB per video file
Base64 input
Embed video bytes inline as base64 using the input_video content type.
import { readFile } from 'node:fs/promises';
const bytes = await readFile('clip.mp4');
const base64 = bytes.toString('base64');
const res = await fetch('https://api.aivene.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.AIVENE_API_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'gemini-2.5-flash',
messages: [{
role: 'user',
content: [
{ type: 'text', text: 'Describe what happens in this video.' },
{ type: 'input_video', input_video: { data: base64, format: 'mp4' } }
]
}]
})
});Data URL input
You can also pass a data URL with the MIME type prefix:
const dataUrl = `data:video/mp4;base64,${bytes.toString('base64')}`;
const res = await fetch('https://api.aivene.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.AIVENE_API_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'gemini-2.5-flash',
messages: [{
role: 'user',
content: [
{ type: 'text', text: 'What is the main subject of this video?' },
{ type: 'input_video', input_video: { data: dataUrl } }
]
}]
})
});When using a data URL, the format field is optional - the MIME type is
extracted from the URL prefix.
Custom content type
input_video is an Aivene extension not in the OpenAI spec. Use fetch
or any HTTP client instead of the OpenAI SDK.
Supported formats
| Format | MIME Type |
|---|---|
mp4 | video/mp4 |
mpeg | video/mpeg |
mov | video/mov |
webm | video/webm |
Size limits
Inline video data has a 100 MB size limit. For larger files, use YouTube URLs or the provider's native file upload API.
Token cost
Video is tokenized at approximately 300 tokens per second at default resolution, or 100 tokens per second at low resolution. A 1-minute video can consume 6,000-18,000 tokens. Keep clips short for cost efficiency.
Duration limits
Models with 1M context window can process:
- Up to 1 hour of video at default resolution
- Up to 3 hours at low resolution
Example: YouTube Video Q&A
curl https://api.aivene.com/v1/chat/completions \
-H "Authorization: Bearer $AIVENE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gemini-2.5-flash",
"messages": [{
"role": "user",
"content": [
{ "type": "text", "text": "What are the key events in this video? Provide timestamps." },
{ "type": "input_video", "input_video": { "url": "https://www.youtube.com/watch?v=VIDEO_ID" } }
]
}]
}'Timestamps
You can ask about specific moments using MM:SS format:
{
"role": "user",
"content": [
{ "type": "text", "text": "What happens at 01:30?" },
{ "type": "input_video", "input_video": { "url": "https://youtu.be/VIDEO_ID" } }
]
}Combining with other modalities
Video can be combined with images and text in the same message:
{
"role": "user",
"content": [
{ "type": "text", "text": "Compare the video with this reference image." },
{ "type": "input_video", "input_video": { "data": "<base64>", "format": "mp4" } },
{ "type": "image_url", "image_url": { "url": "https://example.com/reference.png" } }
]
}