Mira Murati unveiled the first system for natural real-time dialogue with AI
Mira Murati’s lab Thinking Machines introduced TML-Interaction-Small, a 276-billion-parameter model. The system processes audio, video, and text simultaneously

Thinking and listening simultaneously — this is something most AI systems have still not been able to do. Thinking Machines Lab, Mira Murati's lab, presented the first prototype that changes this rule. The TML-Interaction-Small model works like a real dialogue between people: it listens to you and prepares an answer at the same time.
How the multi-threaded architecture works
TML-Interaction-Small is a model with 276 billion parameters, where only 12 billion are active (Mixture-of-Experts format). The main architectural difference: the system processes audio, video, and text simultaneously, in a single data stream. All input data is divided into chunks of 200 milliseconds — just enough for the model to stay synchronized with real conversation and keep up with the pace of human speech. One more detail: the system works without external modules for voice-activity detection. Usually, such modules become a bottleneck — they add latency and complicate the architecture. Here, everything is built directly into the neural network itself. This eliminates unnecessary delays and makes the system much more responsive and dynamic.
Parallel engines for different tasks
The system operates with two components in parallel. The first — real-time interaction model — is responsible for live dialogue with the user and ensures full-duplex information exchange (you can interrupt the system, speak simultaneously). The second component — asynchronous background model — thinks in the background, works with external tools and databases, always has full access to the conversation context.
- First engine handles fast, real-time responses
- Second engine provides deep thinking and complex operations
- Both components see and understand the full context of the entire conversation
- Perception does not freeze during answer generation
- Information processing occurs in a continuous stream, not in separate stages
Revolution from sequential to parallel processing
Almost all modern AI assistants work by a sequential scheme: you finish speaking → the system freezes perception → processes words → produces a ready answer. TML-Interaction-Small breaks this logic. It listens to the user and simultaneously prepares an answer, like in a real dialogue between two people. Such an approach requires a completely different architecture. Instead of discrete moments, the system processes multimodal data as a continuous stream. This allows the model to capture intonation, pauses, emotions, conversational context. As a result, the AI assistant doesn't look robotic but seems like a living conversation partner.
What this means for interaction
This is the first practical step toward truly natural dialogue between humans and AI. Instead of waiting for the next answer, you'll be able to interrupt, clarify, argue — interact as with a real consultant. For companies, this means new opportunities to create assistants that seem alive, not cold and detached.