Documents

Send PDFs, Office documents, and text files to a chat model and extract structured data.

Documents are sent as a file content part on a user message. The chat endpoint extracts text and (for vision-capable models) page imagery, then reasons over both together.

POST /v1/chat/completions

Supported formats

PDF, Word (docx), Excel (xlsx), PowerPoint (pptx), RTF, and many text-based formats (txt, md, json, csv, code files, etc.). Max total file size is 50 MB per request.

Inline base64

For one-off requests, embed the file bytes as base64.

import { readFile } from 'node:fs/promises';

const bytes = await readFile('invoice.pdf');
const base64 = bytes.toString('base64');

const res = await client.chat.completions.create({
  model: 'gpt-4o-mini',
  messages: [{
    role: 'user',
    content: [
      { type: 'text', text: 'Summarise this invoice in 3 bullets.' },
      {
        type: 'file',
        file: {
          filename: 'invoice.pdf',
          file_data: `data:application/pdf;base64,${base64}`
        }
      }
    ]
  }]
});

Inline encoding is the simplest path. Keep total file size under 50 MB per request.

By URL

Public URLs are accepted too - cheaper because the bytes do not transit your machine.

{
  "role": "user",
  "content": [
    { "type": "text", "text": "What is the total on page 2?" },
    {
      "type": "file",
      "file": {
        "filename": "invoice.pdf",
        "file_url": "https://example.com/invoice.pdf"
      }
    }
  ]
}

The URL must be publicly fetchable.

File part fields

FieldTypeDescription
filenamestringOriginal file name - used for extension/MIME detection.
file_datastringBase64 encoded content (with or without data URL prefix).
file_urlstringURL to fetch the file from.
file_idstringManaged file id (file-...) returned by POST /v1/files. Cannot be combined with file_data or file_url.
mime_typestringMIME type override if auto-detection fails.

There are two ways to attach a file:

Inline - send file_data or file_url on every request. Nothing is persisted on our side.

Managed (Files API) - upload once via POST /v1/files, then reference the returned file_id in every subsequent chat completion. The bytes live in your project's file storage until you delete them or expires_after elapses. Managed files are cached to reduce latency on repeated access. This is the right choice for multi-turn conversations, RAG-style workflows, or any time the same document is reused.

// 1. Upload once
const uploaded = await client.files.create({
  file: await fs.readFile('invoice.pdf'),
  purpose: 'user_data'
});

// 2. Reference by id - any number of times
await client.chat.completions.create({
  model: 'gpt-4o-mini',
  messages: [{
    role: 'user',
    content: [
      { type: 'text', text: 'Total on page 2?' },
      { type: 'file', file: { file_id: uploaded.id } }
    ]
  }]
});

When to pick which

  • One-shot question, file < 1 MB → inline file_data.
  • Public URLfile_url.
  • Same document used multiple times, or large file → upload via Files API and pass file_id. Saves bandwidth on every follow-up request.

Multi-document Q&A

Attach several files in one message - the model reads them as related context. Useful for diff-style reviews or comparing two contracts.

{
  "role": "user",
  "content": [
    { "type": "text", "text": "Which clauses changed between v1 and v2?" },
    { "type": "file", "file": { "filename": "v1.pdf", "file_url": "https://.../v1.pdf" } },
    { "type": "file", "file": { "filename": "v2.pdf", "file_url": "https://.../v2.pdf" } }
  ]
}

Structured extraction

Pair documents with response_format: 'json_schema' to get typed output directly. No regex post-processing needed.

const res = await client.chat.completions.create({
  model: 'gpt-4o-mini',
  response_format: {
    type: 'json_schema',
    json_schema: {
      name: 'invoice',
      schema: {
        type: 'object',
        properties: {
          invoice_number: { type: 'string' },
          issue_date: { type: 'string', format: 'date' },
          line_items: {
            type: 'array',
            items: {
              type: 'object',
              properties: {
                description: { type: 'string' },
                quantity: { type: 'integer' },
                unit_price: { type: 'number' }
              },
              required: ['description', 'quantity', 'unit_price']
            }
          },
          total: { type: 'number' }
        },
        required: ['invoice_number', 'issue_date', 'total']
      }
    }
  },
  messages: [{
    role: 'user',
    content: [
      { type: 'text', text: 'Extract invoice fields.' },
      { type: 'file', file: { filename: 'invoice.pdf', file_url: invoiceUrl } }
    ]
  }]
});

const data = JSON.parse(res.choices[0].message.content ?? '{}');

Long documents

If the document is huge, you have three options ranked by simplicity:

  1. Truncate - send only the relevant pages. The model is faster and cheaper when it has less to read.
  2. Map-reduce - chunk the document, summarise each chunk, then summarise the summaries.
  3. RAG - embed each chunk with embeddings, retrieve the top-k for a question, and only attach those to the chat request.

Cost notes

Document processing usage is reported as prompt tokens in usage, the same way text is. Vision-capable models additionally count image tokens for any rendered pages.

Scanned PDFs

Pure-image PDFs (scans, photos of paper) need a vision-capable model (e.g. gpt-4o, claude-sonnet). Text-only models will return very little because they cannot read the pixels.

Errors

StatusReason
400URL unreachable, unsupported MIME type, or document rejected by safety filter.
413Inline base64 exceeded the body size limit.
415Model does not support file input.

Next steps