Rosatom: nuclear plant digital twin shaved 19 days off reactor refueling
Average PPR duration shrank from 53 days to 34 — 19 days saved. For 9 repairs in 2024 — 171 days of additional generation worth 38 billion rubles. Specialist utilization rose from 88% to 96% — same people do more work. Number of "cascading" delays (one operation slides 5+ others) dropped from 47 per PPR to 6. The 15-minute re-planner doesn't let slippages accumulate — they're isolated and handled locally. Side effect: safety. The anomaly detector with 12-minute reaction prevented two potentially dangerous incidents in 2024: a crew started disassembling one auxiliary circuit without verifying the adjacent one was depressurized. Alert fired in 67 seconds, site stopped in 4 minutes. Without AI this could have escalated. Key implementation challenge: culture. Brigadiers with 30 years of experience don't like "the computer" reordering their work. The team built an important frame: "the computer doesn't order, it offers alternatives — the brigadier decides". Recommendation acceptance grew from 23% (first 3 months) to 81% (after 14 months of operation).
Background
Rosatom operates 11 nuclear plants in Russia with 38 power units. Each unit produces about 7 TWh per year — roughly 21 billion rubles. Every 12-18 months each unit shuts down for planned maintenance and refueling (PPR — Planned-Preventive Repair). PPR averages 53 days. Each downtime day costs 58 million rubles in lost revenue plus power-replacement costs. In 2024 alone, Rosatom ran 9 PPRs — 477 days of cumulative downtime.
Problem
PPR planning is orchestrating 8,400 operations, 14 contractors, ~3,000 people on-site simultaneously. Each operation has dependencies: can't start steam-generator disassembly until pressure is released; can't bring equipment in until the corridor is clear. Planning used to happen in a giant MS Project: every time a worker had to dive into Excel to see who's waiting for whom.
Problem: in reality 18% of operations slipped due to unforeseen delays — equipment damage, weather, contractor issues, unexpected findings during disassembly (e.g., corrosion where there shouldn't be any). Each delay cascaded across the critical path. The plan went stale in 3-4 days — but the repair lasted 50+.
Second problem: under-utilization of personnel. On average 12% of specialist time was spent "waiting for the area to be free". A senior welder paid 14,000 rubles per hour just stood in the locker room.
Solution
Rosatom built a "digital twin" of each plant unit as a living model. Not just 3D visualization (what everyone expects when hearing "digital twin"), but a full dependency graph of all 8,400 PPR operations plus a live data feed from repair crews via a mobile app: each crew marks operation progress in real-time, plus sensors on critical equipment show actual physical state.
Three ML models run on top of the digital twin. First — re-planner: every 15 minutes recomputes the optimal plan for remaining operations given the current real state. Uses mixed integer programming with constraints (Google OR-Tools constraint solver, custom branch-and-bound) — 8,400 variables, 22,000 constraints, solves in 8-12 minutes on 96 vCPU.
Second — predictor: forecasts the delay risk of each operation based on history (where delays happened before, at which stage) + current parameters (crew experience, equipment condition, parts availability in block storage). This gives section leaders a "risk map" for tomorrow — where to be especially attentive.
Third — anomaly detector: catches "unexpected" things in the data feed (e.g., a crew started an operation not scheduled to start yet) and auto-generates an alert to the PPR command center. This cut mean reaction time from 3 hours to 12 minutes.
Result
Average PPR duration shrank from 53 days to 34 — 19 days saved. For 9 repairs in 2024 — 171 days of additional generation worth 38 billion rubles. Specialist utilization rose from 88% to 96% — same people do more work.
Number of "cascading" delays (one operation slides 5+ others) dropped from 47 per PPR to 6. The 15-minute re-planner doesn't let slippages accumulate — they're isolated and handled locally.
Side effect: safety. The anomaly detector with 12-minute reaction prevented two potentially dangerous incidents in 2024: a crew started disassembling one auxiliary circuit without verifying the adjacent one was depressurized. Alert fired in 67 seconds, site stopped in 4 minutes. Without AI this could have escalated.
Key implementation challenge: culture. Brigadiers with 30 years of experience don't like "the computer" reordering their work. The team built an important frame: "the computer doesn't order, it offers alternatives — the brigadier decides". Recommendation acceptance grew from 23% (first 3 months) to 81% (after 14 months of operation).
Lessons learned
- A "digital twin" is a dependency graph + live data feed, not a 3D picture. 3D is the UI on top.
- 15-minute re-planner beats a powerful once-a-day scheduler. Plan drift accumulates faster than intuition feels.
- Constraint solvers (OR-Tools) beat ML for rigidly structured problems. ML is added for prediction + anomalies.
- Anomaly detection with 12-minute reaction prevents situations where 3 hours = too late. Especially in nuclear.
- "AI suggests, human decides" — mandatory frame for critical operations. Otherwise adoption is zero.