How Sberbank moved 60% of contact-center load to GigaChat in two years

First year: 47 million inquiries handled automatically — 60% of inbound traffic. Average response time dropped to 15 seconds (from 8 minutes). Contact center OpEx reduced by $120M annualized. NPS among customers whose queries were resolved by AI without escalation: 71 (higher than for live-agent calls at 64). Unexpected win: agent turnover fell from 40% to 19% — those who stayed handle interesting complex cases instead of "what's my balance". The biggest risk to close was financial hallucination. During Samara pilot in February 2024, the model confidently quoted a wrong deposit rate. The fix: every number affecting a financial decision must come from a function call to the source-of-truth system, never from generation.

60%

автоматизировано

15с

среднее время ответа

$120M

экономия в год

47M

обращений / год

Contexto

Sberbank serves 110 million customers through a unified contact center — 7,000 agents in eight Russian regions. Before 2024, 60% of inquiries were routine: balance, transfer status, card blocking, PIN change. Average wait time peaked at eight minutes during rush hours — Monday mornings or the 28th when salaries arrive. Cost per call: 170 rubles. The fifth largest operating expense of the retail division.

Problema

The legacy DTMF-based IVR lost 12% of customers before they reached an agent — elderly users, anxious mobile users, complex queries. Those who passed IVR queued for eight minutes on average. Agents burned out on routine questions — turnover exceeded 40% per year. Hiring and training a new agent cost 280k rubles. Answer consistency was poor: two agents gave different answers to the same question 18% of the time — a regulatory risk.

Solución

The core is GigaChat — Sberbank's proprietary LLM, fine-tuned on 4 million archived dialogs with real customers. The model runs in a hybrid architecture: tier 1 is RAG over internal docs (400 regulations, 2,000 product articles); tier 2 is function-calling into banking APIs for actions like "block my card". It tracks conversation context: if the customer already said "I have a transfer issue", the model doesn't re-ask basic data. For complex or emotionally charged queries, the system escalates to a human agent with full context attached — the agent sees the transcript and the suggested resolution.

Deployed on Yandex Cloud plus an on-prem cluster of 96 H100 GPUs. End-to-end latency: 1.4 seconds — faster than a live agent's cold start. Security: the model has no direct access to balances; all data lookups route through an auth service with 2FA.

Resultado

The biggest risk to close was financial hallucination. During Samara pilot in February 2024, the model confidently quoted a wrong deposit rate. The fix: every number affecting a financial decision must come from a function call to the source-of-truth system, never from generation.

Stack tecnológico

GigaChat 3.5RAG (Qdrant)Function callingYandex Cloud96× H100 GPUPython/FastAPIWebRTC voice gateway

Cronología

Pilot: 4 months. Rollout across 8 regions: another 9 months. From kickoff to 60% automation: 18 months.

Equipo

62 человека: ML (18), backend (14), prompt engineering (9), QA (8), ops (7), product (4), security (2)

Lecciones aprendidas

Numbers that drive financial decisions — function call only, never generation. One hallucination = regulator risk.
Hybrid RAG + function calling beats pure RAG. Docs go stale; APIs are source of truth.
Human handoff with pre-loaded context lifted complex-case NPS by 19 points — without this bridge, automation feels hostile.
Fine-tuning on own dialog corpus is critical: a general model doesn't get banking specifics and routinely generates legally risky phrasings.
Agent retention is undervalued ROI: each retained employee saves 280K rubles in hiring & training.

← Casos