"Advanced Payment Solutions" Launched Voice AI Assistant for Calls to Pilot Without ML Team

"Advanced Payment Solutions" showcased a rare market case: a voice AI assistant for calls was built not by ML engineers, but by 12 backend developers. In six months, the team brought the product to pilot, abandoned fine-tuning and heavy vector databases, moved generation to local Qwen 8B, and achieved a working latency of about two seconds—sufficient for real-time manager prompts during calls.

Khamidun Zhemal

AI monitoring · Habr AI

Apr 30, 2026· 3 min

AI-processed from Habr AI; edited by Hamidun News

"Advanced Payment Solutions" Launched Voice AI Assistant for Calls to Pilot Without ML Team — Source: Habr AI. Collage: Hamidun News.

◐ Listen to article

The company "Advanced Payment Solutions" shared how it launched a voice AI assistant for calls through pilot without its own ML team. In six months, 12 backend developers built a system that in real time suggests to managers how to respond to clients and fits within a delay of around two seconds.

How the MVP was built

Inside the company the project got the name "Prompter". Its task is to listen to already transcribed conversation, understand which product is being discussed, notice client objections and immediately show the manager a text prompt. The final stack was assembled on Python, FastAPI and PostgreSQL, while BERT classifiers and local Qwen 8B handled classification and generation.

For business this is a way to reduce the load on mentors and bring new employees to KPI faster, especially when the ecosystem has more than 35 products and the manager needs to keep too many scenarios in mind. The key constraint was strict: the system has only 1.5–2 seconds to answer, otherwise the prompt loses meaning right in the middle of a live dialogue.

The team reached a working prototype quickly. In the first three weeks developers took text transcripts of calls, trained two BERT classifiers on roughly 1,500 dialogues, assembled simple knowledge bases with scripts and connected everything through prompts with a cloud GPT model. The interface was made in a day using Django. Such a proof-of-concept worked slowly, with a delay of 10–15 seconds, but it was enough to defend the idea in front of business and get the green light for the MVP. Then real engineering work began to reduce delays, stabilization and integrations.

Why everything was simplified

At first the team, as often happens in AI projects, designed an overly ambitious system: their own audio pipeline, several complex classifiers, fine-tuning of a large language model, a vector database and even a self-learning loop. But fairly quickly it became clear that such a path would stretch the launch to 12–18 months and sharply increase the chance of failure. Instead of trying to build the "perfect" architecture developers began systematically removing everything that could be done without in the first release.

"We didn't fight problems, we redesigned the system so these problems

wouldn't arise in it."

Refused fine-tuning in favor of RAG to avoid spending months on annotation and reduce the risk of hallucinations.
Didn't write their own transcription and took ready text segments from Voximplant.
Simplified the objection classifier: instead of 15+ classes left a binary scheme "has objection / no objection".
Didn't pull a heavy vector database for a few megabytes of data and loaded structured JSON files directly into memory.
Moved away from cloud APIs to local Qwen 8B on a GPU server to fit within the delay and not send sensitive data outside the perimeter.

This set of compromises turned out to be key. Cloud models gave an answer in 7–20 seconds, and Qwen 32B although answered better still didn't pass the time test. More compact Qwen 8B turned out to be good enough for manager prompts and stabilized latency at around two seconds. In parallel local deployment closed security questions: call transcripts don't need to be sent to external services, which means they didn't have to build a separate layer of personal data masking and pay for it with additional delays.

What the pilot showed

The most underestimated problem turned out to be not models but data. The team took 200 calls, divided them between 12 participants and quickly hit a wall in manual annotation: to correctly classify objections it's not enough to highlight a phrase, you need to understand the context of the conversation and sales logic. As a result developers rebuilt the problem statement itself. Instead of trying to "teach AI to think like an expert" they focused on a narrower goal: notice in time when the manager needs help, and then pull the needed script and generate a prompt.

By the end of the pilot the system achieved average delay of around two seconds, only in 2–3% of cases rising to three. Service classification gave accuracy above 70%, and speech recognition — from 92% depending on connection quality. The team writes that the pilot already gave a qualitative effect: first signals appeared on convenience, reduction of load on mentors and overall usefulness for operators. But statistically significant conclusions on conversion and KPI are not yet there — for this the product needs scaling and seamless integration right into CRM.

What this means

This case demonstrates well that an internal AI product doesn't always require a ready ML team from scratch. If a company has strong backend engineers, clear business pain and access to processes, an MVP can be assembled faster through strict simplification of architecture and refusal of unnecessary "smart" components. The main takeaway here is not in the choice of a specific model but in discipline: first solve the business problem, then check constraints on speed and security, and only then complicate the stack.

Hamidun News

AI news without noise. Daily editorial selection from 50+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Telegram channel RSS hamidun.com

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

🎓 Academy — 7 days free Free consultation