OpenAI updated ChatGPT to more accurately detect risk in sensitive conversations
OpenAI has updated ChatGPT’s safeguards for sensitive conversations. The model now does a better job of noticing when risk emerges not in a single message, but

OpenAI described safety updates to ChatGPT that help the model better understand context in sensitive conversations. The system has become more accurate at noticing when risk doesn't manifest immediately but accumulates as the dialogue progresses or even across separate chats.
Why Context Matters
In a regular message, a user might ask something neutral or ambiguous, and without previous exchanges, such a request looks harmless. But if there were earlier signs of distress, talk of self-harm, or hints of causing harm to others, the meaning changes dramatically. OpenAI focused the update precisely on such cases: the model was trained to better connect signals from multiple messages and intensify caution not in all conversations indiscriminately, but only where truly alarming signs appear.
The company states that these are rare but critically important scenarios—primarily suicide, self-harm, and threats to others. In such situations, ChatGPT should not simply respond formally but be able to timely refuse dangerous details, reduce conversational intensity, and gently redirect the user toward safer help. The goal of the update is not to make the model overly anxious, but to teach it to distinguish ordinary conversations from genuinely risky episodes.
What Changed
The key innovation is safety summaries—brief factual notes about important safety context. They are created by a separate model trained for safety reasoning tasks and used only in rare cases when there is a serious risk signal. According to OpenAI's description, these notes are not general personalization and do not become long-term memory about the user: they are stored for a limited time and applied only when past context is truly needed for a safer response.
- Match signals from current and past messages
- Help account for risk across separate chats
- Signal the model when conversation de-escalation is needed
- Strengthen refusal of dangerous request details
- Redirect the user toward safer alternatives
OpenAI separately emphasizes that the system was developed not only within the safety team. The work involved psychiatrists and psychologists from the Global Physicians Network, including specialists in forensic psychology, suicide prevention, and self-harm prevention. They helped determine at which moments safety summaries should be created, how much previous context is truly useful, and how long the model should consider it when responding. This is an important detail: the company relied not only on general heuristics but on the practice of specialists who work with such crisis cases.
What Tests Showed
OpenAI provides several internal metrics. In long scenarios within a single conversation, the share of safe responses increased by 50% in cases related to suicide and self-harm, and by 16% in cases of harm to others. The company separately tested performance across multiple conversations and on several models.
For GPT-4o, which is now the standard model in ChatGPT, safe responses improved by 52% in scenarios of harm to others and 39% in scenarios of suicide and self-harm. This shows the system has become better at noticing risk accumulation over time rather than only reacting to obvious red flags. The company also evaluated the quality of the safety summaries themselves.
Based on more than 4,000 internal assessments, they received an average score of 4.93 out of 5 for safety relevance and 4.34 out of 5 for factual accuracy.
At the same time, OpenAI separately checked whether adding such context harms ordinary conversations. According to internal tests, responses in everyday chats remained generally comparable, and no notable user preference between variants with safety summaries and without them was detected. In other words, the bet is on more precise caution without a noticeable drop in quality in normal scenarios.
What It Means
OpenAI is moving toward more robust accounting of previous context not for personalization but for safety in rare critical situations. If the approach truly scales without excessive false positives, ChatGPT will be able to handle complex conversations more carefully, where risk becomes clear only through a chain of messages. For the industry, this is an important signal: safety increasingly depends not on a single request but on the model's ability to see how situations develop over time.