Video Understanding

Send videos to a chat model and ask questions about them.

Vision models that support video can accept video clips as part of a user message and reason over them alongside any text. Use this for video summarization, content analysis, action recognition, and visual Q&A on video content.

POST /v1/chat/completions

Which models support video?

Look for video capability on GET /v1/models. Currently supported Gemini models. Most other models do not support video input yet.

YouTube URL input

The easiest way to analyze a video is to pass a YouTube URL directly.

curl https://api.aivene.com/v1/chat/completions \
  -H "Authorization: Bearer $AIVENE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-2.5-flash",
    "messages": [{
      "role": "user",
      "content": [
        { "type": "text", "text": "Summarize this video in 3 sentences." },
        { "type": "input_video", "input_video": { "url": "https://www.youtube.com/watch?v=VIDEO_ID" } }
      ]
    }]
  }'

Supported YouTube URL formats:

  • https://www.youtube.com/watch?v=VIDEO_ID
  • https://youtu.be/VIDEO_ID
  • https://www.youtube.com/embed/VIDEO_ID
  • https://www.youtube.com/shorts/VIDEO_ID

Limits

  • Only public YouTube videos are supported
  • Max 5 videos per request (inline or URL)
  • Max 5 YouTube URLs per request
  • Max 100 MB per video file

Base64 input

Embed video bytes inline as base64 using the input_video content type.

import { readFile } from 'node:fs/promises';

const bytes = await readFile('clip.mp4');
const base64 = bytes.toString('base64');

const res = await fetch('https://api.aivene.com/v1/chat/completions', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${process.env.AIVENE_API_KEY}`,
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    model: 'gemini-2.5-flash',
    messages: [{
      role: 'user',
      content: [
        { type: 'text', text: 'Describe what happens in this video.' },
        { type: 'input_video', input_video: { data: base64, format: 'mp4' } }
      ]
    }]
  })
});

Data URL input

You can also pass a data URL with the MIME type prefix:

const dataUrl = `data:video/mp4;base64,${bytes.toString('base64')}`;

const res = await fetch('https://api.aivene.com/v1/chat/completions', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${process.env.AIVENE_API_KEY}`,
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    model: 'gemini-2.5-flash',
    messages: [{
      role: 'user',
      content: [
        { type: 'text', text: 'What is the main subject of this video?' },
        { type: 'input_video', input_video: { data: dataUrl } }
      ]
    }]
  })
});

When using a data URL, the format field is optional - the MIME type is extracted from the URL prefix.

Custom content type

input_video is an Aivene extension not in the OpenAI spec. Use fetch or any HTTP client instead of the OpenAI SDK.

Supported formats

FormatMIME Type
mp4video/mp4
mpegvideo/mpeg
movvideo/mov
webmvideo/webm

Size limits

Inline video data has a 100 MB size limit. For larger files, use YouTube URLs or the provider's native file upload API.

Token cost

Video is tokenized at approximately 300 tokens per second at default resolution, or 100 tokens per second at low resolution. A 1-minute video can consume 6,000-18,000 tokens. Keep clips short for cost efficiency.

Duration limits

Models with 1M context window can process:

  • Up to 1 hour of video at default resolution
  • Up to 3 hours at low resolution

Example: YouTube Video Q&A

curl https://api.aivene.com/v1/chat/completions \
  -H "Authorization: Bearer $AIVENE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-2.5-flash",
    "messages": [{
      "role": "user",
      "content": [
        { "type": "text", "text": "What are the key events in this video? Provide timestamps." },
        { "type": "input_video", "input_video": { "url": "https://www.youtube.com/watch?v=VIDEO_ID" } }
      ]
    }]
  }'

Timestamps

You can ask about specific moments using MM:SS format:

{
  "role": "user",
  "content": [
    { "type": "text", "text": "What happens at 01:30?" },
    { "type": "input_video", "input_video": { "url": "https://youtu.be/VIDEO_ID" } }
  ]
}

Combining with other modalities

Video can be combined with images and text in the same message:

{
  "role": "user",
  "content": [
    { "type": "text", "text": "Compare the video with this reference image." },
    { "type": "input_video", "input_video": { "data": "<base64>", "format": "mp4" } },
    { "type": "image_url", "image_url": { "url": "https://example.com/reference.png" } }
  ]
}