Fix Price launched a VLM service for shelf and price tag monitoring in 8,000 stores
Fix Price automated shelf and price tag monitoring across its 8,000+ stores with a computer vision service based on external VLMs. Instead of training its…
AI-processed from Habr AI; edited by Hamidun News
Fix Price has automated the monitoring of product displays and price labels across its 8,000+ stores by deploying a computer vision service based on external Vision Language Models — without developing proprietary ML models from scratch or waiting through multi-year training cycles.
8,000 stores, one task
Fix Price is a network of fixed-price stores with audiences of tens of millions of customers across Russia and the CIS. More than 8,000 points of sale means thousands of shelves that need to be checked every day: are products correctly arranged according to planogram, does each item have a price label, are there no empty spaces. Manual control at this scale is unrealistic — you cannot send an auditor to each of the 8,000 stores every day.
At the same time, the cost of error is direct: an empty shelf or incorrect price label means a lost sale right now, plus a negative customer experience that lingers. In a network of this size, even a small percentage of such situations adds up to tangible financial losses. The Fix Price data analytics center faced a challenge: automatically detect shelf display violations and pricing errors — quickly, at industrial scale, and without excessive investment in its own CV infrastructure.
Why VLM, not a custom model
The classic approach in computer vision for retail is to train a custom neural network on annotated photos of shelves. The approach works, but requires thousands of annotated images, a team of ML engineers, training infrastructure, and a long cycle when assortment changes. Fix Price chose an alternative — external Vision Language Models (VLM). These are multimodal models that can analyze an image and answer questions about it in natural language — a principle similar to GPT-4o Vision or Claude with image support. Key advantages of the VLM approach in this case:
- Quick start without a large annotated dataset
- One model simultaneously checks displays, price labels, and product availability
- New types of checks are added by changing the prompt — without retraining
- Flexibility when expanding to new categories and store formats
- Reduced development and maintenance costs compared to custom CV
How the service works
Images come from surveillance cameras or mobile devices of store employees. The VLM receives a photo and analyzes the frame according to a set of criteria: compliance with planogram, presence of a price label for each item, absence of empty spaces on the shelf. The output is a structured list of violations tied to a specific store. The responsible employee receives the alert and fixes the problem before customer contact. Response speed increases, manual walk-throughs with notebooks shrink.
"I think we all know how customers react to a missing price label or wrong price on it — what feelings an empty shelf evokes when there is no product you came for," —
Kristina Istratova, Head of Data Analytics Center, Fix Price.
What this means
The Fix Price case shows: VLMs have lowered the entry barrier into industrial computer vision so much that a large retailer launched a working service without a multi-year ML project. 8,000 stores is not a pilot, but a real production load. For the rest of retail, this is a clear signal: automating shelf control no longer requires your own ML lab.
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.
The AI world, distilled — once a week
Seven stories that actually mattered, hand-picked. No noise, no reposts, no press releases.
Done! Check your inbox for a confirmation.