Habr AI→ original

How Sber trained smart speakers to generate smart home routines by voice

Sber smart speakers can now create smart home routines from voice commands. Say "turn off the light when I leave" and AI will generate the automation. The main

How Sber trained smart speakers to generate smart home routines by voice
Source: Habr AI. Collage: Hamidun News.
◐ Listen to article

Sber taught GigaChat in its smart speakers to create home automation scenarios directly from voice commands. Now users can say: "Create a scenario so that when I leave home, the lights and heating turn off" — and the speaker will independently generate the automation without touching the screen.

Voice instead of navigation

Until recently, creating a scenario required opening an app, finding the right devices in the list, linking them with conditions, and manually saving the rules. The process was tedious: close the skill in your fingers, touch the screen three times, find filters — this deterred ordinary users. Now a single phrase is enough.

GigaChat analyzes the user's intent, determines which devices are involved, and generates the scenario in seconds. Essentially, it's similar to commands like "OK Google, create a routine," but Sber implemented its own approach, not choosing the classic path of fine-tuning on thousands of examples. Instead, engineers chose in-context learning: information about the user's specific devices is passed directly into GigaChat's context before generation.

The model sees the actual home topology and works with it without prior retraining. This saves on data labeling and accelerates adaptation to new devices — if a user buys a new lamp, they don't need to wait for a model update.

Personalization is the main challenge

The main challenge in smart home management is absolute personalization. One user has 30 devices, another has three. Someone calls a lamp a "lamp," another calls it "bedroom light," a third calls it "sun over the bed."

Sensors, switches, custom scripts — everything can be named completely differently. Regular LLMs often struggle with such variability: they exhaust devices in blind guesses, confuse rooms, misinterpret intent. But here an error is unacceptable — it's not a joke about music recommendations.

If a scenario works incorrectly, a user will freeze at night because the heating won't turn on. Or the air conditioner will run during the day in an empty apartment, consuming electricity. Sber engineers' solution: don't retrain the model for each user (that's impossible), but give it a complete "directory" in the request context.

Before calling GigaChat, the backend collects descriptions of all this user's devices — what functions they have, which room they're in, what names identify them. GigaChat sees this information and can safely use it.

How it works

The pipeline works roughly like this:

  • The user speaks to the speaker: "Create a goodnight scenario"
  • The speaker recognizes speech and sends the text to the backend
  • The backend requests the user's entire device catalog with function descriptions
  • The catalog + request are passed to GigaChat, which generates a YAML-description of the scenario
  • The scenario machine validates the result — checks that all devices actually exist
  • If the check passes, the scenario is saved and becomes active

Validation at the scenario machine level is a safety net. If GigaChat makes a small error (for example, mentions a sensor that doesn't exist, or gets the command syntax wrong), the machine will notice and either correct it or ask the user to clarify. Sber called this machine a scenario machine — it works as an error check for each generated rule.

What this means

Smart home becomes more accessible to the ordinary person. If a speaker correctly creates scenarios by voice, then a newcomer doesn't need to memorize the interface or read 50 pages of instructions. Just say what you want, and the system will do it. This is an important step toward smart homes moving out of the enthusiast niche into the mass market, where people value simplicity above all else.

ZK
Hamidun News
AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.
What do you think?
Loading comments…