Documents
Send PDFs, Office documents, and text files to a chat model and extract structured data.
Documents are sent as a file content part on a user message. The chat
endpoint extracts text and (for vision-capable models) page imagery, then
reasons over both together.
POST /v1/chat/completionsSupported formats
PDF, Word (docx), Excel (xlsx), PowerPoint (pptx), RTF, and many text-based formats (txt, md, json, csv, code files, etc.). Max total file size is 50 MB per request.
Inline base64
For one-off requests, embed the file bytes as base64.
import { readFile } from 'node:fs/promises';
const bytes = await readFile('invoice.pdf');
const base64 = bytes.toString('base64');
const res = await client.chat.completions.create({
model: 'gpt-4o-mini',
messages: [{
role: 'user',
content: [
{ type: 'text', text: 'Summarise this invoice in 3 bullets.' },
{
type: 'file',
file: {
filename: 'invoice.pdf',
file_data: `data:application/pdf;base64,${base64}`
}
}
]
}]
});Inline encoding is the simplest path. Keep total file size under 50 MB per request.
By URL
Public URLs are accepted too - cheaper because the bytes do not transit your machine.
{
"role": "user",
"content": [
{ "type": "text", "text": "What is the total on page 2?" },
{
"type": "file",
"file": {
"filename": "invoice.pdf",
"file_url": "https://example.com/invoice.pdf"
}
}
]
}The URL must be publicly fetchable.
File part fields
| Field | Type | Description |
|---|---|---|
filename | string | Original file name - used for extension/MIME detection. |
file_data | string | Base64 encoded content (with or without data URL prefix). |
file_url | string | URL to fetch the file from. |
file_id | string | Managed file id (file-...) returned by POST /v1/files. Cannot be combined with file_data or file_url. |
mime_type | string | MIME type override if auto-detection fails. |
There are two ways to attach a file:
Inline - send file_data or file_url on every request. Nothing
is persisted on our side.
Managed (Files API) - upload once via
POST /v1/files, then reference the
returned file_id in every subsequent chat completion. The bytes live in
your project's file storage until you delete them or expires_after
elapses. Managed files are cached to reduce latency on repeated access.
This is the right choice for multi-turn conversations, RAG-style
workflows, or any time the same document is reused.
// 1. Upload once
const uploaded = await client.files.create({
file: await fs.readFile('invoice.pdf'),
purpose: 'user_data'
});
// 2. Reference by id - any number of times
await client.chat.completions.create({
model: 'gpt-4o-mini',
messages: [{
role: 'user',
content: [
{ type: 'text', text: 'Total on page 2?' },
{ type: 'file', file: { file_id: uploaded.id } }
]
}]
});When to pick which
- One-shot question, file < 1 MB → inline
file_data. - Public URL →
file_url. - Same document used multiple times, or large file → upload via
Files API and pass
file_id. Saves bandwidth on every follow-up request.
Multi-document Q&A
Attach several files in one message - the model reads them as related context. Useful for diff-style reviews or comparing two contracts.
{
"role": "user",
"content": [
{ "type": "text", "text": "Which clauses changed between v1 and v2?" },
{ "type": "file", "file": { "filename": "v1.pdf", "file_url": "https://.../v1.pdf" } },
{ "type": "file", "file": { "filename": "v2.pdf", "file_url": "https://.../v2.pdf" } }
]
}Structured extraction
Pair documents with response_format: 'json_schema' to get typed output
directly. No regex post-processing needed.
const res = await client.chat.completions.create({
model: 'gpt-4o-mini',
response_format: {
type: 'json_schema',
json_schema: {
name: 'invoice',
schema: {
type: 'object',
properties: {
invoice_number: { type: 'string' },
issue_date: { type: 'string', format: 'date' },
line_items: {
type: 'array',
items: {
type: 'object',
properties: {
description: { type: 'string' },
quantity: { type: 'integer' },
unit_price: { type: 'number' }
},
required: ['description', 'quantity', 'unit_price']
}
},
total: { type: 'number' }
},
required: ['invoice_number', 'issue_date', 'total']
}
}
},
messages: [{
role: 'user',
content: [
{ type: 'text', text: 'Extract invoice fields.' },
{ type: 'file', file: { filename: 'invoice.pdf', file_url: invoiceUrl } }
]
}]
});
const data = JSON.parse(res.choices[0].message.content ?? '{}');Long documents
If the document is huge, you have three options ranked by simplicity:
- Truncate - send only the relevant pages. The model is faster and cheaper when it has less to read.
- Map-reduce - chunk the document, summarise each chunk, then summarise the summaries.
- RAG - embed each chunk with embeddings, retrieve the top-k for a question, and only attach those to the chat request.
Cost notes
Document processing usage is reported as prompt tokens in usage, the same way
text is. Vision-capable models additionally count image tokens for any
rendered pages.
Scanned PDFs
Pure-image PDFs (scans, photos of paper) need a vision-capable model
(e.g. gpt-4o, claude-sonnet). Text-only models will return very
little because they cannot read the pixels.
Errors
| Status | Reason |
|---|---|
400 | URL unreachable, unsupported MIME type, or document rejected by safety filter. |
413 | Inline base64 exceeded the body size limit. |
415 | Model does not support file input. |
Next steps
- Image Understanding - the same content-part pattern with images.
- Embeddings reference - for RAG over long documents.