AWS explained how to deploy Pipecat voice AI agents in Bedrock AgentCore Runtime
AWS released the first part of a practical guide to Pipecat voice agents in Bedrock AgentCore Runtime. The focus is on choosing the transport: from simple…
AI-processed from AWS Machine Learning Blog; edited by Hamidun News
AWS has released the first part of a practical guide on how to deploy voice Pipecat agents in Amazon Bedrock AgentCore Runtime. The focus is not on the models themselves, but on the transport layer, which determines whether the conversation will sound natural or whether the user will experience pauses and delays.
Why latency matters
A voice agent almost always operates in challenging conditions: a browser, mobile application or phone call, unstable network, load spikes and the expectation of real-time response. AWS emphasizes that for natural dialogue, latency should remain almost imperceptible — typically within one second from the end of the user's statement to the start of the agent's response. Otherwise the conversation breaks down: the interlocutor interrupts the agent, thinks it has frozen, or simply leaves. This is especially critical for support, virtual assistants and outbound campaigns.
To mitigate this risk, AWS suggests running Pipecat agents in Bedrock AgentCore Runtime — a secure serverless environment for AI agents. Each session runs in an isolated microVM, the platform automatically scales for traffic spikes and can maintain continuous conversations for up to eight hours. This matters for long multi-step calls where you cannot simply cut off the context. Another advantage is paying only for the resources actually consumed, without having to maintain server reserves for peak load.
Pipecat itself can be packaged in a container and deployed with minimal overhead if the image is built for ARM64.
What options exist
In the first part, AWS reviews the path from client to agent — that very "first hop" which most strongly affects the perception of speed. The company compares four approaches: regular WebSockets, WebRTC with TURN relay, managed WebRTC through specialized providers, and telephony for working with PSTN and contact centers. Each option has its own balance between simplicity, reliability and connection quality.
The idea is simple: there is no single best transport for all scenarios, but there are clear use cases where each one looks like a reasonable starting point.
- WebSockets — the simplest option for prototypes and light scenarios in web and mobile applications.
- WebRTC with TURN — the best choice if you need lower latency and resilience on poor networks.
- Managed WebRTC — the path to production when you want to offload global media network, analytics and relay infrastructure to an external service.
- Telephony — an option for calls, IVR replacement, outbound campaigns and integration with contact centers.
For WebSockets, AWS shows a maximally direct approach. The client first requests a signed address from an intermediate server; this server generates a pre-signed URL with SigV4 signature via the AWS SDK; then the browser connects directly to the agent at the /ws address. This keeps credentials off the client side, and the traffic itself after connection is established flows without an unnecessary intermediary. AWS calls this a good starting point: it is simpler than the alternatives, natively supported by most clients, and suitable for quickly validating a product.
What to consider in production
If the goal is not a demo but a stable conversational interface, AWS recommends looking toward WebRTC. This transport typically works over UDP, better handles fluctuating network conditions and delivers audio faster in both directions. But AgentCore has architectural nuances.
A direct peer-to-peer connection does not work here because the runtime environment does not receive a public IP. The STUN scenario also does not work as the primary path: AWS notes that NAT Gateway uses symmetric NAT, which breaks direct connection hole-punching. Therefore the practical recommendation is TURN relay and VPC configuration for the runtime.
In the working scheme, you need to configure the ICE_SERVER_URLS variable both on the intermediate server and in the agent's environment, then place AgentCore Runtime in a private VPC subnet and give it outbound access via NAT Gateway.
As the AWS-native option for TURN, the company offers Amazon Kinesis Video Streams: the service provides temporary, automatically rotated ICE credentials through the GetIceServerConfig API. This eliminates external dependencies, but there are limitations: an active signaling channel costs $0.03 per month, the limit is 5 TPS per channel, meaning at high volumes of new sessions you will need to distribute load across multiple channels. Plus you still need internet access to reach KVS.
AWS also separately mentions managed WebRTC providers from the AWS Marketplace. This option is useful if in addition to transport you need globally distributed SFU/TURN nodes, built-in observability and support for multi-user rooms, not just one-on-one dialogue.
For telephony scenarios the logic is similar: the agent continues to maintain a constant two-way audio stream but connects to the telecom provider via SIP, WebSocket or WebRTC. Pipecat already provides ready-made transports and serializers, so the task reduces not to building a voice stack from scratch but to choosing the right channel.
What this means
AWS effectively shows that the bottleneck in voice AI agents has long since shifted from the model to the infrastructure for audio delivery. For teams this is a useful guideline: you can start with WebSockets, but for serious production you will almost inevitably need to choose between WebRTC, managed media networks and telephony — depending on where exactly the user is talking to the agent.
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.