Habr AI→ original

How to automate reading engineering drawings: 6 YOLO models instead of manual work

A system of 6 YOLO models and custom OCR automatically extracts all parameters that affect cost from engineering drawings: dimensions, threads, material, tolera

How to automate reading engineering drawings: 6 YOLO models instead of manual work
Source: Habr AI. Collage: Hamidun News.
◐ Listen to article

Extracting data from engineering drawings manually is a tedious job prone to errors. When ordering custom part manufacturing, you need to manually enter approximately 20 parameters from the drawing into a calculator: dimensions, threads, tolerances, surface roughness, allowances, material, weight. One engineering team assembled an automated pipeline that takes a PDF drawing, reads it like a human would, and extracts everything needed in structured form. The output is JSON for the calculator.

Solution Architecture

The system works in three steps: data localization, text recognition, result synthesis. A PDF drawing comes in, JSON with parameters comes out. Intermediate stages:

  • Resolution and contrast normalization
  • Projection extraction (front, side, top views)
  • Separation of part outline from auxiliary lines
  • Localization of text fields and dimension arrows
  • Symbol recognition (thread, tolerance, surface roughness)
  • Linking arrows to their values through a connectivity graph

Pipeline Components

Six specialized YOLO models are used for computer vision. Each is trained on a subset of 500+ real production drawings:

1. Projection detection — finds front, side, top views in the drawing. 2. Dimension localization — highlights all dimension arrows and text fields. 3. Special symbol recognition — reads thread designations (M10), tolerance grades (IT6), surface roughness (Ra 3.2). 4. Part outline — separates the visible outline from auxiliary lines. 5. Auxiliary lines — finds centerlines and auxiliary construction elements. 6. Arrows and pointers — localizes all types of arrows and associated text values.

Custom OCR is connected to YOLO — standard solutions struggle reading handwritten notes and special symbols like ∅ (diameter) and thread designation conventions. The neural network was trained on a dataset with expert annotations. Arrow logic is a weighted graph: if an arrow starts at point A, passes through geometric object B, and ends near text C, then value C belongs to object A. In practice it's more complex: arrows can be dashed, S-shaped, multiple arrows can point to one location, causing ambiguity.

Reality Gets in the Way

Testing on production drawings revealed problems that don't exist in ideal datasets:

  • Dirty scans — drawings from 20 years ago, scans from copy machines, water stains, random pencil marks.
  • Typographic liberties — threads can be written as "Ø10×1.5", "M10" or even drawn as a spring.
  • Colored annotations — dimensions highlighted in red pen, but OCR often filters red lines as noise.
  • Overcrowded sheets — 30+ dimensions on one drawing, arrows intersect, creating confusion.

The solution came from data augmentation: synthetic drawings were generated with added noise, garbage, contrast changes, and old scan imitation. After training on the expanded dataset, quality on dirty drawings improved from 68% to 92%.

What This Means

Automating drawing reading is an example of how human labor is replaced by a combo of publicly available tools (YOLO) + engineering logic (arrow graph) + specialized tuning. For manufacturing, it's a 15x speedup: instead of 30 minutes of manual input — 2 minutes on autopilot. For business — faster delivery of quotes without manual data entry.

ZK
Hamidun News
AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.
What do you think?
Loading comments…