OpenAI introduced GPT-Realtime-2 with reasoning in live dialogue
OpenAI launched three new voice models: GPT-Realtime-2 with level-5 reasoning, a translation model supporting 70+ languages, and streaming Whisper for real-time

OpenAI released three new voice models for its API, expanding developers' ability to integrate fifth-level reasoning (GPT-5-class) directly into audio applications and voice interfaces. OpenAI's move is another step in the battle for dominance in the AI market.
GPT-Realtime-2: Real-Time Reasoning
GPT-Realtime-2 brings complex logical reasoning capabilities to live voice dialogue for the first time. Unlike simple voice assistants, the new model understands the nuance of conversation context and can handle multi-step tasks without losing meaning. This is important for applications requiring consultation, planning, analytics, or technical support — where simple templated responses just won't do. The model processes speech in real time, allowing users to speak freely without waiting for a processing pause. Responses arrive at natural speed, creating the impression of dialogue with a real person.
Multilingual Translation and Transcription
OpenAI released a separate translation model supporting over 70 input languages. This allows developers to build global applications without needing to duplicate models for each language — one model covers most of the world's population. Additionally, a streaming version of Whisper for transcription has been announced. It processes audio in real time and delivers text as sound arrives. This is critical for applications like video calls, live translators, and voice assistants, where latency directly impacts UX.
Three key components:
- GPT-Realtime-2 for voice reasoning and dynamic dialogue
- Translation model supporting 70+ input languages
- Streaming Whisper for low-latency audio transcription
Pricing Strategy: Market Capture
OpenAI has set aggressive prices on the new models, making them accessible to small developer teams and startups. The company is clearly targeting rapid market share capture in the voice AI applications space. This approach contrasts with positioning of text models, where OpenAI maintains a premium price position. Investment in accessibility of voice models signals that OpenAI sees voice as the next frontier of AI interaction. Whoever captures developers first in this space will have a strong competitive advantage.
What This Means
Voice AI interfaces are transitioning from experimental phase to a practical part of the developer stack. More accessible prices lower the barrier to entry — now a startup can embed speech AI into its application without major investment. This will accelerate the emergence of new voice applications on the market.