Amazon Nova Sonic: Three Architectures for Voice Agents

AWS has released a guide to building scalable voice agents with Amazon Nova Sonic. The article covers three architectural patterns for audio processing, ways to minimize latency, and the integration of multi-agent systems through Bedrock AgentCore. Ideal for developers building customer service bots and AI assistants.

Khamidun Zhemal

AI monitoring · AWS Machine Learning Blog

May 23, 2026· 2 min·updated Jul 12, 2026

AI-processed from AWS Machine Learning Blog; edited by Hamidun News

Amazon Nova Sonic: Three Architectures for Voice Agents — Source: AWS Machine Learning Blog. Collage: Hamidun News.

◐ Listen to article

AWS has shared recommendations for building scalable voice agents using Amazon Nova Sonic. This is a modern model for processing natural speech in real-time scenarios — from customer service and technical support to appointment booking and personal assistants. The AWS blog breaks down three popular architectural patterns, ways to minimize latency, and practices for integrating multi-agent systems.

Amazon Nova Sonic: a model for dialogue

Amazon Nova Sonic is a compact yet powerful model for voice interaction, available through the Amazon Bedrock API. Unlike large foundation models, Sonic is optimized specifically for low-latency responses and real-time audio stream processing. It can work both directly with audio and with text transcription, depending on the architecture.

The key advantage is integration with tools and external APIs. An agent can not only answer a question but also invoke a function: check order status, book a table at a restaurant, get weather forecast. All of this happens within one conversation, without switching between applications.

Three architectural patterns

AWS describes three main approaches, each with different trade-offs between simplicity and functionality.

Single-turn agentless — the simplest pattern. A user speaks one phrase, the model responds. No state memory, no session management. Works well for FAQ bots and simple reference systems. Fast and reliable, but not suitable for complex processes requiring multiple steps.

Multi-turn with state — the agent remembers conversation context and can conduct multi-step dialogue. For example, hotel booking: "What dates?" → "For how many people?" → "Do you have location preferences?". Here you need to manage the session, save dialogue variables, track which step has been completed. Bedrock AgentCore helps with this.

Multi-agent orchestration — several specialized agents work together. For example, one agent handles tariff questions, another handles technical support, a third handles billing. The main orchestrator decides who to pass the request to. Strands BidiAgent provides clean bidirectional flow — not just voice synthesis in response, but processing a live stream from the user.

Minimizing latency: practice

The main challenge for voice agents is response time. Users notice even 100–200 ms delay between the end of their question and the start of the response. The brain interprets this as unnatural, and the agent begins to seem slow or frozen. AWS recommends several techniques:

Streaming API instead of batch — don't wait for the full response from the model, send the first voice tokens immediately
Tool call caching — repeated requests return the cached result
Session segmentation — the system automatically determines boundaries of logical conversation blocks
Edge deployment — place the model closer to the end user

What this means

Voice interfaces are becoming the standard for interaction: from smart speakers to corporate call centers. Previously, companies had to assemble such systems from separate pieces. Now AWS provides a ready-made solution: model + tools + orchestration. If you're building a customer service bot or AI assistant — this is a practical guide from firsthand experience.

Hamidun News

AI news without noise. Daily editorial selection from 50+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Telegram channel RSS hamidun.com

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

🎓 Academy — 7 days free Free consultation