Mira Murati unveiled the first system for natural real-time dialogue with AI

Q: Источник материала?

Оригинальная публикация на MarkTechPost. Hamidun News обрабатывает и адаптирует материалы с помощью AI.

Q: Когда опубликовано?

2026-05-17. Время чтения: 3 мин.

Mira Murati’s lab Thinking Machines introduced TML-Interaction-Small, a 276-billion-parameter model. The system processes audio, video, and text simultaneously

Hamidun News Editorial

AI monitoring · MarkTechPost

2026-05-17· 2 min

Mira Murati unveiled the first system for natural real-time dialogue with AI — Source: MarkTechPost. Collage: Hamidun News.

◐ Listen to article

Thinking and listening simultaneously — this is something most AI systems have still not been able to do. Thinking Machines Lab, Mira Murati's lab, presented the first prototype that changes this rule. The TML-Interaction-Small model works like a real dialogue between people: it listens to you and prepares an answer at the same time.

How the multi-threaded architecture works

TML-Interaction-Small is a model with 276 billion parameters, where only 12 billion are active (Mixture-of-Experts format). The main architectural difference: the system processes audio, video, and text simultaneously, in a single data stream. All input data is divided into chunks of 200 milliseconds — just enough for the model to stay synchronized with real conversation and keep up with the pace of human speech. One more detail: the system works without external modules for voice-activity detection. Usually, such modules become a bottleneck — they add latency and complicate the architecture. Here, everything is built directly into the neural network itself. This eliminates unnecessary delays and makes the system much more responsive and dynamic.

Parallel engines for different tasks

The system operates with two components in parallel. The first — real-time interaction model — is responsible for live dialogue with the user and ensures full-duplex information exchange (you can interrupt the system, speak simultaneously). The second component — asynchronous background model — thinks in the background, works with external tools and databases, always has full access to the conversation context.

First engine handles fast, real-time responses
Second engine provides deep thinking and complex operations
Both components see and understand the full context of the entire conversation
Perception does not freeze during answer generation
Information processing occurs in a continuous stream, not in separate stages

Revolution from sequential to parallel processing

Almost all modern AI assistants work by a sequential scheme: you finish speaking → the system freezes perception → processes words → produces a ready answer. TML-Interaction-Small breaks this logic. It listens to the user and simultaneously prepares an answer, like in a real dialogue between two people. Such an approach requires a completely different architecture. Instead of discrete moments, the system processes multimodal data as a continuous stream. This allows the model to capture intonation, pauses, emotions, conversational context. As a result, the AI assistant doesn't look robotic but seems like a living conversation partner.

What this means for interaction

This is the first practical step toward truly natural dialogue between humans and AI. Instead of waiting for the next answer, you'll be able to interrupt, clarify, argue — interact as with a real consultant. For companies, this means new opportunities to create assistants that seem alive, not cold and detached.

Hamidun News

AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Telegram channel RSS hamidun.com