Dani Shcherbakov's startup cut voice-agent pauses to 0.3 seconds and scaled to 1 million calls a month
Dani Shcherbakov's team showed how modern AI call agents are moving beyond old script-based bots. Their system keeps pauses under 0.3 seconds, integrates…
AI-processed from Habr AI; edited by Hamidun News
Daniil Shcherbakov's startup demonstrates how voice AI agents for business are moving away from traditional scripted robots. Through custom orchestration, fine-tuned models, and CRM integration, the system responds faster than humans, handles high-volume dialing, and is already used in commercial cases.
Why calls sound more lifelike
The key difference from classical intent-based robots is working not by a rigid set of phrases, but by company context, call purpose, and dialogue history. The article provides an example from a plant nursery: a client asked about planting timelines and the availability of a pear tree with a closed root system, and the agent didn't deviate from the script—it continued the conversation as a consultant rather than an automated responder. These moments, according to the author, change attitudes toward outbound calls: people don't hear endless pauses, repetitions, and attempts to redirect them back to the original branch.
"Autumn is a good period for planting fruit trees."
For business, this matters not only for user experience. A live call center requires lengthy training, quality control, and constant hiring, while results still depend on employee fatigue and domain knowledge. In real estate, for example, a manager begins selling consistently only after hundreds or thousands of calls. An AI agent is free from this variance: it talks the same way in the morning, at night, and at the end of the week, and according to the material, conversational partners more often continue dialogue even after learning that it's not a human calling.
How the stack is built
Inside, the platform is structured as a modular system with a unified orchestrator. First, the caller's speech is converted to text in real-time by the recognition module, then this text is processed by a language model together with dialogue logic, after which the response goes to speech synthesis. In parallel, the system writes contact history, lead status, and key metrics to the CRM and internal analytics.
The key metric is latency under 0.3 seconds between the human's remark and the system's response. This is the threshold at which the conversation stops sounding like a typical robotic outbound call and begins to feel like an ordinary phone dialogue.
Special emphasis is placed not on the LLM itself, but on the combination of the model with hard scenario constraints. A base model can generate a plausible but off-target response, while in sales, lead qualification, required questions, objection handling, and moving the conversation to the next step are critical. Therefore, the team built their own dialogue layer on top of the model. According to the author, it was trained on large arrays of real business conversations and client scripts, and its task is to keep the conversation within business logic, even if the interlocutor answers unconventionally or suddenly changes the subject.
What numbers we get
The material states that launching such an agent takes just days: first, they collect data about the product and scenarios, then configure the agent, connect the CRM, and launch testing. After this, the system is managed through a cloud dashboard where you can quickly change dialogue logic, test hypotheses with A/B tests, and view analytics for each dialogue.
Already at this stage, the value shifts from simple cost savings to response speed: the service can dial through large databases in minutes, while competitors are still distributing leads among operators.
- Conversion to qualified lead in cold outbound calls for a construction developer increased by 50%.
- In cleaning services, conversion from request to lead rose from 48% to 59%.
- Outbound call costs in one case dropped by 60%.
- Response time to an incoming request fell from 1.5 hours to 3 seconds.
- Scaling from 5,000 to 20,000 calls per day takes not months of hiring, but a few days of configuration.
The author separately emphasizes manageability. If in a traditional call center only a small portion of conversations are monitored, here every call can be analyzed and scenarios quickly corrected. This matters for companies competing for the same contact base: when an entire database can be processed in 9–10 minutes, the advantage goes not to the one with more operators, but to the one who qualifies leads faster and passes them to sales. In this mode, a million calls per month is no longer exotic.
What this means
Voice AI agents are gradually moving from the category of "wow-factor demos" to a full-fledged operational tool. If the stated metrics are confirmed across different verticals, business gets not just a replacement for first-line support, but a managed sales and service channel where response speed, scenario consistency, and scalability matter more than human improvisation.
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.