OpenAI launched GPT-Realtime-2 and two more voice models via the API

Q: Источник материала?

Оригинальная публикация на 3DNews AI. Hamidun News обрабатывает и адаптирует материалы с помощью AI.

Q: Когда опубликовано?

2026-05-17. Время чтения: 3 мин.

OpenAI expanded the API with three voice models: the updated GPT-Realtime-2 and two new ones. They let apps recognize speech, synthesize it, and translate conve

Hamidun News Editorial

AI monitoring · 3DNews AI

2026-05-17· 3 min

OpenAI launched GPT-Realtime-2 and two more voice models via the API — Source: 3DNews AI. Collage: Hamidun News.

◐ Listen to article

OpenAI announced an expansion of voice capabilities in its API — developers now have access to an updated GPT-Realtime-2 model and two new voice models for speech recognition, synthesis, and translation.

Three New Voice Models in API

Three models have been added to the API: an updated GPT-Realtime-2 (an improved version of the existing one) and two completely new models. They are designed for different tasks — recognizing user speech, synthesizing responses with voice, and translating conversations between languages in real time. This means developers can now embed voice interaction directly into their applications without using external speech recognition and synthesis services. Previously, it was necessary to integrate multiple providers — one for recognition, another for synthesis, a third for translation. Now everything is in one place.

What the New Models Can Do

Speech recognition (speech-to-text) with support for many languages
Speech synthesis (text-to-speech) with natural sound and intonation
Real-time conversation translation while preserving context
Low latency for interactive applications (streaming)
Deep integration with GPT-4 for semantic understanding

The models are trained on large volumes of audio data and show good results in both English and other languages. GPT-Realtime-2 has been updated — improvements in natural speech processing, context understanding, and response speed. Developers will get tools to create applications that hear the user, understand what they're saying, and respond with voice. This is important for voice assistants, call centers, educational applications, and interactive services.

How It Works in Practice

Imagine a language learning application. A student speaks in a foreign language. The API hears this (speech-to-text), sends the text to GPT-4 for checking and correction, then voices the result in natural speech (text-to-speech). All of this happens in real time. Or consider a translator application: a tourist speaks in Russian, the API translates in real time and voices it in English. No delays like in Google Translate.

Availability and Competition

For now, the models are available only through the API for developers. They will not appear in ChatGPT or other OpenAI consumer applications (at least not in the near future). This allows OpenAI to release new capabilities to specialists, refine them on real applications, and then, if needed, integrate them into consumer products. API prices will be higher than text models but lower than competitors (for example, Google Cloud Speech-to-Text). OpenAI competes with Google, Amazon Polly, Microsoft Azure Speech Services, and other cloud platforms. Voice APIs are a competitive field where every millisecond of latency and every percentage of accuracy matter.

Voice interface is no longer exotic — it is becoming the standard for

modern applications.

What This Means

Voice interface is becoming more accessible. Now any developer can add voice communication with AI to their application without costly integration of third-party services. This will accelerate the appearance of voice AI applications on the market and make interaction with services more natural.

Hamidun News

AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Telegram channel RSS hamidun.com