Habr AI→ original

Voice AI agent lied to customers and mixed up callers — developer found the cause wasn’t in the prompt

Three months in production, and the voice AI agent had already lied to a customer about a non-existent administrator, spent a month passing off ordinary…

AI-processed from Habr AI; edited by Hamidun News
Voice AI agent lied to customers and mixed up callers — developer found the cause wasn’t in the prompt
Source: Habr AI. Collage: Hamidun News.
◐ Listen to article

A voice AI agent developer for business spent three months in production and documented all the pitfalls: the agent lied, confused customers, and voice cloning didn't work as intended.

Three Major Agent Failures

Over the first months in production, the voice agent managed to distinguish itself several times.

First incident: the agent told a customer that "administrator Alexey" would call back within an hour. No such Alexey existed — the agent generated the name from the context of previous calls, mixing fragments of conversations. The customer waited and filed a complaint.

Second bug is more serious: the agent started treating all incoming calls as from the same person. The session wasn't being reset between calls — memory of the previous customer leaked into the next conversation. Technically, this is a classic shared context problem without explicit isolation.

  • The agent called the new customer by the name of the previous caller
  • Remembered details of another's order and suggested "continuing the checkout"
  • Confirmed non-existent agreements from previous sessions
  • Apologized for "delays" that never happened — confusing the current call with the previous one

Third case — the longest to detect. For an entire month, the agent output standard TTS synthesis while telling clients it was a "cloned voice." Voice cloning hadn't been connected due to a configuration error, but it didn't throw an exception anywhere — it simply degraded silently to standard synthesis.

Why the Prompt Doesn't Fix It

The first intuitive reaction — add to the prompt "don't make up names," "don't remember previous callers," "always clarify if there's a voice profile." The author tried — and explains why this doesn't work systematically.

A language model doesn't distinguish between a prohibition in the prompt and data from the session context. If the history of a previous call physically lands in the context window — the model uses it. You cannot instruct away what's already in memory.

"A prompt is a recommendation, not an architectural barrier.

The barrier should be in the code."

The solution is to isolate state at the infrastructure level: hard reset of context between calls, checking for voice profile availability before the call starts (not during), explicit validation of every fact before vocalization.

Russian Stack and Its Specifics

The author works entirely on domestic tools: a Russian LLM, a Russian TTS provider, telephony through a domestic operator. This imposes specific constraints.

Documentation for some tools is incomplete or lags behind the API. That's exactly why the voice cloning error remained invisible: when the provider doesn't have a voice profile, it doesn't return an error — it quietly returns standard synthesis with a 200 code.

The substitution could only be detected by analyzing output audio or explicitly checking response metadata.

Practical lessons from three months of experience:

  • Check for all resources (voice, profile, session) before starting an operation — not during
  • Log not only errors but "successful" responses: silent degradation is more dangerous than an explicit crash
  • Isolate agent state between sessions at the code level, not the prompt level
  • Test audio output, not just text logs — synthesis and cloning sound different

What This Means

Voice AI agents in production break differently than chatbots: errors sound aloud, the client hears them in real time and can't re-read or ignore them. This makes architectural care critical.

Most "strange behavior" of a voice agent — not model hallucinations, but architectural holes in the code around it. It can be fixed. But not with a prompt.

ZK
Hamidun News
AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Need AI working inside your business — not just in your newsfeed?

I build production AI for companies — custom CRM, internal tools, autonomous agents, workflow automation. Owned by you, shaped to your process, no per-seat tax. Built by Zhemal Khamidun, CPO of AlpinaGPT (AI platform, 6,000+ users).

What do you think?
Loading comments…