Mistral Ships OCR 4 — Self-Hostable Document AI in 170 Languages
Mistral AI shipped OCR 4, a self-hostable document-AI model in 170 languages at $4 per 1,000 pages, topping benchmarks — a private alternative to cloud document APIs.
TL;DR — France’s Mistral AI shipped OCR 4, a self-hostable, structure-aware document-AI model supporting 170 languages at $4 per 1,000 pages, topping document-AI benchmarks and pitched as a private, on-premises alternative to cloud-only document APIs.
Europe’s leading AI lab has a new pitch for enterprises wary of the cloud. On June 23, 2026, Mistral AI shipped OCR 4, a self-hostable document model.
The release
Mistral AI shipped OCR 4, a structure-aware document-AI model supporting 170 languages and priced at $4 per 1,000 pages (dropping to $2 with a Batch-API discount). It posted top scores on document benchmarks — OlmOCRBench (85.20) and OmniDocBench (93.07) — with a 72% average human-preference win rate versus competitors, and runs in a single container for fully self-hosted deployment.
| Metric | OCR 4 |
|---|---|
| Languages | 170 |
| Price | $4 / 1,000 pages ($2 batch) |
| OlmOCRBench | 85.20 |
| Deployment | Self-hosted (single container) |
What they said
"The availability of Mistral Document AI with OCR 4 in Microsoft Foundry marks an important milestone in our partnership." — Kimmi Grewal, VP of AI Ecosystem Partnerships, Microsoft
Why it matters
- Privacy as a feature. Self-hosting means sensitive documents never leave the enterprise.
- Cost and speed. One customer cited roughly 8× lower cost and 17× lower latency than agentic parsers.
- A European challenger. Mistral targets Google and AWS document APIs head-on.
FAQ
What is Mistral OCR 4?
A structure-aware document-AI model from Mistral AI, released June 23, 2026, that extracts text and structure from documents in 170 languages at $4 per 1,000 pages. It can run fully self-hosted in a single container, and topped document-AI benchmarks like OlmOCRBench (85.20).
Why does self-hosting matter for document AI?
It lets enterprises process sensitive documents on their own infrastructure so data never leaves their control — a key differentiator versus cloud-only document APIs from Google and AWS, alongside lower cited cost and latency.
Sources
Image: Mistral AI logo by Mistral AI — Public domain, via Wikimedia Commons.
← Back to all posts