Rosatom: nuclear plant digital twin shaved 19 days off reactor refueling

Average PPR duration shrank from 53 days to 34 — 19 days saved. For 9 repairs in 2024 — 171 days of additional generation worth 38 billion rubles. Specialist utilization rose from 88% to 96% — same people do more work. Number of "cascading" delays (one operation slides 5+ others) dropped from 47 per PPR to 6. The 15-minute re-planner doesn't let slippages accumulate — they're isolated and handled locally. Side effect: safety. The anomaly detector with 12-minute reaction prevented two potentially dangerous incidents in 2024: a crew started disassembling one auxiliary circuit without verifying the adjacent one was depressurized. Alert fired in 67 seconds, site stopped in 4 minutes. Without AI this could have escalated. Key implementation challenge: culture. Brigadiers with 30 years of experience don't like "the computer" reordering their work. The team built an important frame: "the computer doesn't order, it offers alternatives — the brigadier decides". Recommendation acceptance grew from 23% (first 3 months) to 81% (after 14 months of operation).

-19 дн

PPR duration

₽38B

extra generation/yr

88→96%

people utilization

12 мин

anomaly reaction

Background

Rosatom operates 11 nuclear plants in Russia with 38 power units. Each unit produces about 7 TWh per year — roughly 21 billion rubles. Every 12-18 months each unit shuts down for planned maintenance and refueling (PPR — Planned-Preventive Repair). PPR averages 53 days. Each downtime day costs 58 million rubles in lost revenue plus power-replacement costs. In 2024 alone, Rosatom ran 9 PPRs — 477 days of cumulative downtime.

Problem

PPR planning is orchestrating 8,400 operations, 14 contractors, ~3,000 people on-site simultaneously. Each operation has dependencies: can't start steam-generator disassembly until pressure is released; can't bring equipment in until the corridor is clear. Planning used to happen in a giant MS Project: every time a worker had to dive into Excel to see who's waiting for whom.

Problem: in reality 18% of operations slipped due to unforeseen delays — equipment damage, weather, contractor issues, unexpected findings during disassembly (e.g., corrosion where there shouldn't be any). Each delay cascaded across the critical path. The plan went stale in 3-4 days — but the repair lasted 50+.

Second problem: under-utilization of personnel. On average 12% of specialist time was spent "waiting for the area to be free". A senior welder paid 14,000 rubles per hour just stood in the locker room.

Solution

Rosatom built a "digital twin" of each plant unit as a living model. Not just 3D visualization (what everyone expects when hearing "digital twin"), but a full dependency graph of all 8,400 PPR operations plus a live data feed from repair crews via a mobile app: each crew marks operation progress in real-time, plus sensors on critical equipment show actual physical state.

Three ML models run on top of the digital twin. First — re-planner: every 15 minutes recomputes the optimal plan for remaining operations given the current real state. Uses mixed integer programming with constraints (Google OR-Tools constraint solver, custom branch-and-bound) — 8,400 variables, 22,000 constraints, solves in 8-12 minutes on 96 vCPU.

Second — predictor: forecasts the delay risk of each operation based on history (where delays happened before, at which stage) + current parameters (crew experience, equipment condition, parts availability in block storage). This gives section leaders a "risk map" for tomorrow — where to be especially attentive.

Third — anomaly detector: catches "unexpected" things in the data feed (e.g., a crew started an operation not scheduled to start yet) and auto-generates an alert to the PPR command center. This cut mean reaction time from 3 hours to 12 minutes.

Result

Number of "cascading" delays (one operation slides 5+ others) dropped from 47 per PPR to 6. The 15-minute re-planner doesn't let slippages accumulate — they're isolated and handled locally.

Side effect: safety. The anomaly detector with 12-minute reaction prevented two potentially dangerous incidents in 2024: a crew started disassembling one auxiliary circuit without verifying the adjacent one was depressurized. Alert fired in 67 seconds, site stopped in 4 minutes. Without AI this could have escalated.

Key implementation challenge: culture. Brigadiers with 30 years of experience don't like "the computer" reordering their work. The team built an important frame: "the computer doesn't order, it offers alternatives — the brigadier decides". Recommendation acceptance grew from 23% (first 3 months) to 81% (after 14 months of operation).

Technology stack

Google OR-Tools (CP-SAT solver)Custom MIP schedulerApache Kafka (event stream)TimescaleDBPyTorch (predictor + anomaly detector)Unity 3D (digital twin visualization)iOS/Android crew app

Timeline

Prototype on 1 Leningrad NPP unit: 14 months. Full digital twin model: another 8 months. ML layer on top: 6 months. Rollout across all 38 units: 24 months total.

Team

127 человек: digital twin engineers (28), ML (18), nuclear domain experts (22), backend (16), mobile (12), 3D viz (11), data engineers (10), QA (10)

Lessons learned

A "digital twin" is a dependency graph + live data feed, not a 3D picture. 3D is the UI on top.
15-minute re-planner beats a powerful once-a-day scheduler. Plan drift accumulates faster than intuition feels.
Constraint solvers (OR-Tools) beat ML for rigidly structured problems. ML is added for prediction + anomalies.
Anomaly detection with 12-minute reaction prevents situations where 3 hours = too late. Especially in nuclear.
"AI suggests, human decides" — mandatory frame for critical operations. Otherwise adoption is zero.

← Cases