Ozon: how a 1.2B-parameter transformer boosted GMV by 14%

GMV grew 14% — about 340 billion rubles annualized. Main-screen conversion rose from 4.1% to 6.8%. Average order value: +8% (the model better understands which up-sell items are relevant). Main-feed CTR: +31%. The most dramatic effect: cold start for new users. Previously, a new user's first purchase happened on average 12 days after registration (couldn't find what they wanted in "generally popular"). With PRISM — 4 days. Three times faster monetization of a new user. The contextual model revealed an interesting pattern: "evening sessions" (after 21:00) give 1.8× more conversions on personal recommendations than "morning" (before 12:00). In the morning, users goal-search for specific items; in the evening, they relax-browse and are receptive to recommendations. This changed push-notification marketing strategy: "evening picks" became the priority communication.

+14%

GMV

4.1→6.8%

конверсия главной

12→4д

до первой покупки

28ms

p50 latency

Contexto

Ozon is Russia's second-largest e-commerce platform after Wildberries: 220M SKUs, 64M active buyers, 19,000 employees. 2024 GMV: 2.4 trillion rubles. The app's main screen sees 18M unique users per day. Each app open rebuilds the main screen — that's 18M personalized feeds × 60 product cards = 1.08 billion product-impressions per day, each needs to be relevant.

Problema

The legacy recommendation system — collaborative filtering over matrix factorization, running since 2019. Standard "you might also like". Works for frequent buyers with long history. For new users (50,000 registrations/day) — cold start: recommend "platform popular" = same thing for everyone.

Main pain: temporal mismatch. If a user bought a kid's bicycle in April, the system kept showing bicycles in August ("popular in category!"). But they need a school backpack now — and the system doesn't get it. Ozon's main-screen conversion was 4.1%; competitors in mature markets: 7-9%.

Second pain: no context. Day of week, time of day, weather in user's region, what they did in the app in the last 30 seconds — all ignored. A morning user at work searches for one thing; evening at home — another. The legacy model didn't differentiate.

Solução

Ozon's team built their own architecture called PRISM — Personalized Recommendation Inference at Scale Model. It's a 1.2 billion-parameter transformer trained on 940 million user sessions. Architecturally: encoder-decoder, where encoder reads "everything we know about the user" as a long sequence (purchase history, views, searches, clicks, likes, favorites — all time), and decoder generates a ranked list of item IDs for next display.

Key innovation: a session-level contextual layer. Before the main transformer runs a small "here and now" model (LSTM, 18M params) that looks at the last 30 seconds of activity and forms a "short intent" (e.g., "shopping for wife's gift", "browsing holiday items", "checking sizes"). This short-term intent feeds into the main transformer as additional context tokens.

Infrastructure: a serious challenge. PRISM runs 24/7 for 18M users × 60 items = 1B inferences per day. The team optimized via precomputation: user base embeddings refresh hourly; "here and now" — once per session (5-30s latency). Inference on a specialized NVIDIA H100 grid (140 GPUs), p50 latency 28ms, p99 110ms.

Resultado

The most dramatic effect: cold start for new users. Previously, a new user's first purchase happened on average 12 days after registration (couldn't find what they wanted in "generally popular"). With PRISM — 4 days. Three times faster monetization of a new user.

The contextual model revealed an interesting pattern: "evening sessions" (after 21:00) give 1.8× more conversions on personal recommendations than "morning" (before 12:00). In the morning, users goal-search for specific items; in the evening, they relax-browse and are receptive to recommendations. This changed push-notification marketing strategy: "evening picks" became the priority communication.

Stack tecnológico

Custom 1.2B transformer (encoder-decoder)LSTM context layer (18M)PyTorch + TorchServeNVIDIA H100 ×140Redis (embeddings cache)Apache Kafka (event stream)ClickHouse (analytics)

Cronologia

PRISM prototype: 11 months. A/B test on 5% traffic: another 3 months. Full rollout: 18 months total. Trains every 4 hours on fresh sessions.

Equipe

73 человека: ML researchers (22), ML engineers (18), data engineers (12), MLOps (8), backend integration (8), product (5)

Lições aprendidas

Long sequences (full user history) > summaries. The transformer decides what matters; don't lose signal on feature engineering.
Short-term intent (last 30s) is a separate layer. Without it, recommendations lag in time.
Cold start solves via context: even a new user has "Thursday evening, raining, gift shopping" — that's already a lot.
Precomputed base embeddings + real-time intent give p99 110ms. Without precomputation — impossible.
5% A/B test for 3 months is the only way to prove +14% GMV isn't random. Recommendation stat-power matrices require more samples than you'd think.

← Casos