Loka built a voice agent on Amazon Nova 2 Sonic with sub-second latency

Q: What is the source?

Originally published on AWS Machine Learning Blog. Hamidun News processes and adapts the material with AI.

Q: When was it published?

Jun 28, 2026. Reading time: 3 min.

Loka has published the architecture of a voice agent built on Amazon Nova 2 Sonic — an AWS speech model that bypasses the classic ASR→LLM→TTS chain and…

Hamidun News Editorial

AI monitoring · AWS Machine Learning Blog

Jun 28, 2026· 2 min

AI-processed from AWS Machine Learning Blog; edited by Hamidun News

Loka built a voice agent on Amazon Nova 2 Sonic with sub-second latency — Source: AWS Machine Learning Blog. Collage: Hamidun News.

◐ Listen to article

Loka published a detailed breakdown of the architecture it used to create a voice agent based on Amazon Nova 2 Sonic — AWS's next-generation speech model. The challenge was straightforward: build a bot that customers won't hang up on after waiting a few seconds.

The Problem Being Solved

Robotic voice in phone bots is not just an aesthetic irritation. For businesses, it means direct losses: the customer hangs up, calls back to speak to a live operator, or switches to a competitor. Brand reputation suffers, support costs rise.

Classical voice systems work through a long chain: speech recognition (ASR) → text conversion → language model → answer generation → speech synthesis (TTS). Latency accumulates at every step. As a result, the pause between the customer's question and the bot's answer is 2 to 5 seconds.

In that time, a person decides the system isn't working and either hangs up or demands a live operator. Loka set out to break this chain and create an agent that responds within the natural pause of conversation, like a live interlocutor. The solution became Amazon Nova 2 Sonic.

What Nova 2 Sonic Does Differently

Nova 2 Sonic is a multimodal speech-to-speech model from AWS that works directly with audio, bypassing separate ASR transcription and TTS synthesis steps. It takes an audio stream as input and generates an audio stream as output without intermediate conversion to text. This fundamentally changes the latency profile:

Responses begin within 300–500 ms after the user pauses
The model understands natural interruptions in speech and responds correctly to them
The system hears intonation and emotional context — and adapts the response tone accordingly
The feeling of "the system is processing" completely disappears from the dialogue
Integration with business logic through function calling doesn't interrupt the conversation flow

Nova 2 Sonic is available through Amazon Bedrock, allowing companies on AWS to integrate it without switching providers or completely rebuilding their infrastructure.

Production Architecture

Loka implemented real-time audio streaming with minimal buffering. The system doesn't wait for the user's full statement — it begins processing immediately, allowing Nova 2 Sonic to respond precisely at the moment of a natural pause rather than after prolonged silence.

"Robotic voice is the main reason customers hang up.

It's not a technical problem — it's a trust problem," notes the Loka team.

To access business data in real time — order status, customer history, stock availability — the agent uses real-time function calling. For the customer, this looks like an instant response rather than a noticeable pause while waiting for results. In production, the system demonstrates resilience to interruptions, topic switches, and non-standard pauses — scenarios where classical ASR systems most often fail.

What This Means

Speech-to-speech models remove the main barrier to mass adoption of voice bots — the noticeable latency that destroys the illusion of live conversation. If latency is imperceptible and the voice sounds natural, the boundary between agent and operator blurs. For businesses, this is a direct path to call center automation without harming NPS. Following Nova 2 Sonic, similar models from other providers will enter the market — competition in the voice AI segment is only beginning.

Hamidun News

AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Telegram channel RSS hamidun.com

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

🎓 Academy — 7 days free Free consultation