LLM-based system reduced quality control map preparation at metallurgical plant from 2 hours to 5 minutes
At a metallurgical plant, an LLM-system was trained to read GOST scans and assemble quality control maps in 3–5 minutes instead of two hours. A universal…
AI-processed from Habr AI; edited by Hamidun News
Practical case study from a metallurgical enterprise shows that large language models are already capable of not only answering questions, but also relieving engineers from heavy regulatory routine work. The system was trained to read scans of Soviet GOSTs and automatically assemble control maps in 3–5 minutes instead of more than two hours of manual work. In terms of labor costs, according to the project author's assessment, this eliminates a volume of tasks equivalent to the work of three process engineers.
The problem was quite concrete. At a full-cycle enterprise, approximately 3,200 people work, and the product range exceeds 4,500 items and constantly grows. For each product, a technologist had to open a GOST, OST, or other regulatory document, find the necessary tables and notes, substitute parameters such as steel grade, blank diameter and group, and then manually fill in more than 40 control parameters.
The difficulty was that there were over two hundred documents, and a significant portion of them existed as rather inconvenient PDF scans from the Soviet era. The first idea was a conventional parser, but it quickly fell through. The format of the regulatory documents was too heterogeneous: in one document, the needed values lay in table rows, in another they were hidden in notes, in a third they were scattered throughout the text with references to other sections.
A simple template-based extraction scheme breaks down here because the task requires understanding the meaning of the document, not just its visual structure. Therefore, the author turned to LLM and assembled a scheme where the model received a PDF of the standard, product parameters, and a set of rules on how to determine specific control parameters as input. The next hypothesis seemed logical: create a single universal prompt for all regulatory documents.
On the first test, such an approach showed decent results. According to the author, Claude Sonnet 4.6 in thinking mode correctly identified 85% of parameters for one GOST, and GPT 5.
4 — 72%. But on subsequent documents, universality fell apart. Models got confused with nested tables, misinterpreted boundary conditions like "no less than" and "no more than," and sometimes missed constants or links between sections.
It turned out that the elegant general approach was inferior here to a narrower but more manageable architecture. The working solution emerged after breaking down the product range according to the Pareto principle. It turned out that 80% of the plant's products are described by approximately 18% of GOSTs.
For the pilot, they selected 20 of the most frequently used documents and created a separate prompt for each with specific rules: where to find the parameter, in which table or section it is described, and how to interpret contentious cases. The system received a PDF and product characteristics as input, and returned a table with the parameter, value, and reference to the location in the standard as output. When an error occurred, the author sent a screenshot to the dialogue, showed the correct answer, and asked to update the rule so the failure would not repeat.
Nine iterations and 14 working days were spent on refinement, after which parameter extraction for the selected GOSTs began to work without errors. Currently, the project is being moved from experimental mode to a more production-friendly format. Rules are being extracted from prompts into an Excel spreadsheet so that technologists can edit the logic themselves without delving into prompt engineering.
The model now receives not only a PDF and product parameters as input, but also this rules table, and returns data in a format suitable for loading into the enterprise's internal information system. This layer makes the solution scalable: new standards can be gradually added without tying all support to a single developer. The main conclusion from this case study is simple: in industry, LLMs work better not as "universal intelligence for all occasions," but as a carefully tuned tool for a specific class of documents.
If you start with the most common regulatory documents, make the model indicate the source of each parameter, and keep the rules editable for subject matter experts, AI transforms from a demonstration into a production mechanism with clear economics. Such an approach may well be repeated not only in metallurgy, but also in mechanical engineering, construction, chemistry, pharmaceuticals, and energy—everywhere people still manually transfer data from regulations into working systems.
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.