Documents

Send PDFs, Office documents, and text files to a chat model and extract structured data.

Documents are sent as a file content part on a user message. The chat endpoint extracts text and (for vision-capable models) page imagery, then reasons over both together.

POST /v1/chat/completions

Supported formats

PDF, Word (docx), Excel (xlsx), PowerPoint (pptx), RTF, and many text-based formats (txt, md, json, csv, code files, etc.). Max total file size is 50 MB per request.

Inline base64

For one-off requests, embed the file bytes as base64.

import { readFile } from 'node:fs/promises';

const bytes = await readFile('invoice.pdf');
const base64 = bytes.toString('base64');

const res = await client.chat.completions.create({
  model: 'gpt-4o-mini',
  messages: [{
    role: 'user',
    content: [
      { type: 'text', text: 'Summarise this invoice in 3 bullets.' },
      {
        type: 'file',
        file: {
          filename: 'invoice.pdf',
          file_data: `data:application/pdf;base64,${base64}`
        }
      }
    ]
  }]
});

Inline encoding is the simplest path. Keep total file size under 50 MB per request.

By URL

Public URLs are accepted too - cheaper because the bytes do not transit your machine.

{
  "role": "user",
  "content": [
    { "type": "text", "text": "What is the total on page 2?" },
    {
      "type": "file",
      "file": {
        "filename": "invoice.pdf",
        "file_url": "https://example.com/invoice.pdf"
      }
    }
  ]
}

The URL must be publicly fetchable.

File part fields

Field	Type	Description
`filename`	string	Original file name - used for extension/MIME detection.
`file_data`	string	Base64 encoded content (with or without data URL prefix).
`file_url`	string	URL to fetch the file from.
`file_id`	string	Managed file id (`file-...`) returned by `POST /v1/files`. Cannot be combined with `file_data` or `file_url`.
`mime_type`	string	MIME type override if auto-detection fails.

There are two ways to attach a file:

Inline - send file_data or file_url on every request. Nothing is persisted on our side.

Managed (Files API) - upload once via POST /v1/files, then reference the returned file_id in every subsequent chat completion. The bytes live in your project's file storage until you delete them or expires_after elapses. Managed files are cached to reduce latency on repeated access. This is the right choice for multi-turn conversations, RAG-style workflows, or any time the same document is reused.

// 1. Upload once
const uploaded = await client.files.create({
  file: await fs.readFile('invoice.pdf'),
  purpose: 'user_data'
});

// 2. Reference by id - any number of times
await client.chat.completions.create({
  model: 'gpt-4o-mini',
  messages: [{
    role: 'user',
    content: [
      { type: 'text', text: 'Total on page 2?' },
      { type: 'file', file: { file_id: uploaded.id } }
    ]
  }]
});

When to pick which

One-shot question, file < 1 MB → inline file_data.
Public URL → file_url.
Same document used multiple times, or large file → upload via Files API and pass file_id. Saves bandwidth on every follow-up request.

Multi-document Q&A

Attach several files in one message - the model reads them as related context. Useful for diff-style reviews or comparing two contracts.

{
  "role": "user",
  "content": [
    { "type": "text", "text": "Which clauses changed between v1 and v2?" },
    { "type": "file", "file": { "filename": "v1.pdf", "file_url": "https://.../v1.pdf" } },
    { "type": "file", "file": { "filename": "v2.pdf", "file_url": "https://.../v2.pdf" } }
  ]
}

Structured extraction

Pair documents with response_format: 'json_schema' to get typed output directly. No regex post-processing needed.

const res = await client.chat.completions.create({
  model: 'gpt-4o-mini',
  response_format: {
    type: 'json_schema',
    json_schema: {
      name: 'invoice',
      schema: {
        type: 'object',
        properties: {
          invoice_number: { type: 'string' },
          issue_date: { type: 'string', format: 'date' },
          line_items: {
            type: 'array',
            items: {
              type: 'object',
              properties: {
                description: { type: 'string' },
                quantity: { type: 'integer' },
                unit_price: { type: 'number' }
              },
              required: ['description', 'quantity', 'unit_price']
            }
          },
          total: { type: 'number' }
        },
        required: ['invoice_number', 'issue_date', 'total']
      }
    }
  },
  messages: [{
    role: 'user',
    content: [
      { type: 'text', text: 'Extract invoice fields.' },
      { type: 'file', file: { filename: 'invoice.pdf', file_url: invoiceUrl } }
    ]
  }]
});

const data = JSON.parse(res.choices[0].message.content ?? '{}');

Long documents

If the document is huge, you have three options ranked by simplicity:

Truncate - send only the relevant pages. The model is faster and cheaper when it has less to read.
Map-reduce - chunk the document, summarise each chunk, then summarise the summaries.
RAG - embed each chunk with embeddings, retrieve the top-k for a question, and only attach those to the chat request.

Cost notes

Document processing usage is reported as prompt tokens in usage, the same way text is. Vision-capable models additionally count image tokens for any rendered pages.

Scanned PDFs

Pure-image PDFs (scans, photos of paper) need a vision-capable model (e.g. gpt-4o, claude-sonnet). Text-only models will return very little because they cannot read the pixels.

Errors

Status	Reason
`400`	URL unreachable, unsupported MIME type, or document rejected by safety filter.
`413`	Inline base64 exceeded the body size limit.
`415`	Model does not support file input.

Next steps

Image Understanding - the same content-part pattern with images.
Embeddings reference - for RAG over long documents.