Mistral AI News→ original

Mistral released OCR 4: bounding boxes, 170 languages, and self-hosted deployment

Mistral AI released OCR 4 — a next-generation engine for document recognition. The model now returns not only text, but also the coordinates of each block…

AI-processed from Mistral AI News; edited by Hamidun News
Mistral released OCR 4: bounding boxes, 170 languages, and self-hosted deployment
Source: Mistral AI News. Collage: Hamidun News.
◐ Listen to article

Mistral AI released OCR 4 — an engine for intelligent corporate document processing. Unlike its predecessors, the model returns not just extracted text, but a complete structured map of the document: block coordinates, their types, and confidence scores for each word.

What changed in the fourth version

The most sought-after innovation is bounding boxes: each text block now receives precise coordinates on the page. This allows downstream systems to highlight cited sources directly in the interface, build reliable data pipelines, and implement human-in-the-loop verification for sensitive documents. Previously, most OCR solutions returned "flat" text without binding to page position.

In addition to coordinates, OCR 4 classifies each block by type: heading, subheading, paragraph, table, equation, image caption. Combined with inline confidence scores — at the page level and for each individual word — this opens fundamentally new scenarios: citation with precise source attribution, automatic redaction of confidential data, operator-managed verification of results.

RAG pipelines benefit especially noticeably: classified blocks become quality retrieval units, and agents gain the ability not just to read documents, but to act on their basis — fill out forms, process invoices, perform compliance checks.

Technical characteristics and pricing

OCR 4 accepts standard corporate formats — PDF, DOC, PPT, OpenDocument — and supports 170 languages across 10 language groups. Mistral specifically highlights quality gains for rare and low-resource languages, where most competing systems show noticeable degradation.

Key capabilities:

  • Bounding boxes — precise localization of each block on the page
  • Block typing — headings, tables, equations, captions, images
  • Confidence scores — at the page level and for each word
  • 170 languages in 10 language groups, including low-resource ones
  • Single-container deployment — the entire model fits in one container

Pricing through API: $4 per thousand pages. With batch processing through Batch API, a 50% discount applies — total $2 per thousand pages. Document AI in Mistral Studio (no-code interface) is priced at $5 per thousand pages.

Self-hosted deployment in a single container is available to corporate clients who prioritize data sovereignty, regulatory compliance, and high-performance batch processing. The compact model size makes it suitable both for budget scenarios and high-load processing.

Benchmarks and integrations

Independent annotators preferred OCR 4 to all tested OCR and Document AI systems — average win rate was 72%. On the public OlmOCRBench benchmark, the model scored 85.20 — the best result among tested solutions at time of publication.

"Downstream systems gain access not only to what is written in the

document, but also to where each element is located, what role it plays, and how confident the model is in each page area," — this is how Mistral describes the philosophy of the release.

OCR 4 is integrated into Mistral Search Toolkit — an open framework for enterprise search announced at the AI Now Summit. It serves as the ingestion component for RAG pipelines and enterprise search: the model's structured output becomes citation-ready input for retrieval, scoring, and result reranking systems.

What this means

Mistral is turning document recognition from an auxiliary utility into an infrastructure primitive of corporate AI systems. Structured output with coordinates, block types, and confidence scores — this is exactly the level of detail that agentic systems need for reliable work with real documents. Players building RAG platforms and document intelligence solutions get a ready-made component without need for additional post-processing.

ZK
Hamidun News
AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Need AI working inside your business — not just in your newsfeed?

I build production AI for companies — custom CRM, internal tools, autonomous agents, workflow automation. Owned by you, shaped to your process, no per-seat tax. Built by Zhemal Khamidun, CPO of AlpinaGPT (AI platform, 6,000+ users).

What do you think?
Loading comments…