OpenAI updated ChatGPT to more accurately detect risk in sensitive conversations

Q: Источник материала?

Оригинальная публикация на OpenAI Blog. Hamidun News обрабатывает и адаптирует материалы с помощью AI.

Q: Когда опубликовано?

2026-05-16. Время чтения: 3 мин.

OpenAI has updated ChatGPT’s safeguards for sensitive conversations. The model now does a better job of noticing when risk emerges not in a single message, but

ЖХ

Редакция Hamidun News

AI‑мониторинг · OpenAI Blog

2026-05-16· 3 мин

OpenAI updated ChatGPT to more accurately detect risk in sensitive conversations — Source: OpenAI Blog. Коллаж: Hamidun News.

◐ Слушать статью

OpenAI described safety updates to ChatGPT that help the model better understand context in sensitive conversations. The system has become more accurate at noticing when risk doesn't manifest immediately but accumulates as the dialogue progresses or even across separate chats.

Why Context Matters

In a regular message, a user might ask something neutral or ambiguous, and without previous exchanges, such a request looks harmless. But if there were earlier signs of distress, talk of self-harm, or hints of causing harm to others, the meaning changes dramatically. OpenAI focused the update precisely on such cases: the model was trained to better connect signals from multiple messages and intensify caution not in all conversations indiscriminately, but only where truly alarming signs appear.

The company states that these are rare but critically important scenarios—primarily suicide, self-harm, and threats to others. In such situations, ChatGPT should not simply respond formally but be able to timely refuse dangerous details, reduce conversational intensity, and gently redirect the user toward safer help. The goal of the update is not to make the model overly anxious, but to teach it to distinguish ordinary conversations from genuinely risky episodes.

What Changed

The key innovation is safety summaries—brief factual notes about important safety context. They are created by a separate model trained for safety reasoning tasks and used only in rare cases when there is a serious risk signal. According to OpenAI's description, these notes are not general personalization and do not become long-term memory about the user: they are stored for a limited time and applied only when past context is truly needed for a safer response.

Match signals from current and past messages
Help account for risk across separate chats
Signal the model when conversation de-escalation is needed
Strengthen refusal of dangerous request details
Redirect the user toward safer alternatives

OpenAI separately emphasizes that the system was developed not only within the safety team. The work involved psychiatrists and psychologists from the Global Physicians Network, including specialists in forensic psychology, suicide prevention, and self-harm prevention. They helped determine at which moments safety summaries should be created, how much previous context is truly useful, and how long the model should consider it when responding. This is an important detail: the company relied not only on general heuristics but on the practice of specialists who work with such crisis cases.

What Tests Showed

OpenAI provides several internal metrics. In long scenarios within a single conversation, the share of safe responses increased by 50% in cases related to suicide and self-harm, and by 16% in cases of harm to others. The company separately tested performance across multiple conversations and on several models.

For GPT-4o, which is now the standard model in ChatGPT, safe responses improved by 52% in scenarios of harm to others and 39% in scenarios of suicide and self-harm. This shows the system has become better at noticing risk accumulation over time rather than only reacting to obvious red flags. The company also evaluated the quality of the safety summaries themselves.

Based on more than 4,000 internal assessments, they received an average score of 4.93 out of 5 for safety relevance and 4.34 out of 5 for factual accuracy.

At the same time, OpenAI separately checked whether adding such context harms ordinary conversations. According to internal tests, responses in everyday chats remained generally comparable, and no notable user preference between variants with safety summaries and without them was detected. In other words, the bet is on more precise caution without a noticeable drop in quality in normal scenarios.

What It Means

OpenAI is moving toward more robust accounting of previous context not for personalization but for safety in rare critical situations. If the approach truly scales without excessive false positives, ChatGPT will be able to handle complex conversations more carefully, where risk becomes clear only through a chain of messages. For the industry, this is an important signal: safety increasingly depends not on a single request but on the model's ability to see how situations develop over time.

ЖХ

Hamidun News

AI‑новости без шума. Ежедневный редакторский отбор из 400+ источников. Продукт Жемала Хамидуна, Head of AI в Alpina Digital.

Telegram-канал RSS hamidun.com