OpenAI explained what data ChatGPT uses for training and how it protects privacy

Q: Источник материала?

Оригинальная публикация на OpenAI Blog. Hamidun News обрабатывает и адаптирует материалы с помощью AI.

Q: Когда опубликовано?

2026-05-16. Время чтения: 3 мин.

OpenAI detailed how ChatGPT uses data for training and what privacy controls users have. The company says it applies a Privacy Filter to mask personal informati

ЖХ

Редакция Hamidun News

AI‑мониторинг · OpenAI Blog

2026-05-16· 3 мин

OpenAI explained what data ChatGPT uses for training and how it protects privacy — Source: OpenAI Blog. Коллаж: Hamidun News.

◐ Слушать статью

On May 6, OpenAI published a detailed explanation of how ChatGPT acquires knowledge about the world while trying not to involve unnecessary personal data in training. The company simultaneously described data sources, internal filters, and settings that users can use themselves to limit the use of their conversations.

Where the Data Comes From

In a post, OpenAI divides data sources into several categories. To train the models that underlie ChatGPT, the company uses publicly available information from the internet, data from partnerships, as well as materials that were provided or generated by users, contractors, and researchers. The idea is for the model to learn general patterns, facts, and connections between topics, rather than memorize individual personal stories.

According to OpenAI, it is precisely this broad set of sources that helps make answers more useful, stable, and safe. The company separately clarifies an important detail: if it is a matter of content from the open internet, then only materials that are in free and open access are used for training. OpenAI gives public posts, blogs, and discussions on open forums as examples.

This does not eliminate questions about the limits of acceptable use of open data, but shows that the company is trying to formalize a rule: not everything on the internet is automatically considered suitable for training if access to that content is limited.

How They Remove Personal Information

Before data enters training, OpenAI runs it through a set of protective mechanisms designed to reduce the volume of personal information in datasets. The main one is Privacy Filter, a tool for searching and masking personal information in text. According to the company, this filter is applied at several stages of the process, including to public datasets and to user conversations if a person has enabled the Improve the model for everyone setting.

OpenAI also states that it made Privacy Filter free for other developers so that this approach could be used beyond ChatGPT. A separate layer of protection is related not to training, but to ChatGPT's responses themselves. The service should reject requests to provide private or sensitive information about specific people, although OpenAI directly acknowledges that errors are still possible.

If personal information still appears in a response and a person considers it inaccurate or inappropriate, they can submit a request through the privacy portal. At the same time, the company emphasizes that privacy and response to serious risks, such as credible threats of violence, should work simultaneously, not interfere with each other.

"Privacy protection is a central part of how we build

ChatGPT."

What Settings Are Available

The most practical part of the material is a list of user toggles that allow you to decide how much data to provide to the system. OpenAI emphasizes that control over conversations is not hidden deep in documentation, but is placed directly in the ChatGPT interface. That is, it's not just about the company's principles, but about quite practical actions: you can disable the participation of new chats in training, remove memory, or switch to a separate temporary mode for more sensitive requests.

In Settings -> Data Controls you can disable the Improve the model for everyone option. After that, new chats will remain in the history, but will not be used to train models.
Temporary Chat mode launches a one-time conversation: it is not saved in history, does not create memory, and does not improve models.
Temporary chats are stored for 30 days for security purposes, then deleted.
The Memory function can be viewed, edited, cleared, or completely disabled if you don't want ChatGPT to remember past details.
Users can also export their data, delete their account, and submit a request through the privacy portal.

There is also a direct warning: do not send sensitive information to ChatGPT that you are not ready to share even in the context of system review or processing. This is an important caveat, because many perceive the chat interface as a private notebook or safe interlocutor by default. OpenAI, on the contrary, tries to convey a more sober model of use: the user has control tools, but the responsibility for what exactly they enter into the service does not disappear.

What This Means

OpenAI is essentially trying to move the conversation about privacy from the level of general promises to a set of concrete rules and toggles. For users, this is useful: it became clearer what data can participate in training, how to disable this scenario, and how a regular chat differs from Temporary Chat. For the market, this is a signal that trust in AI products is increasingly dependent not only on the quality of the model, but also on transparency in handling personal information.

ЖХ

Hamidun News

AI‑новости без шума. Ежедневный редакторский отбор из 400+ источников. Продукт Жемала Хамидуна, Head of AI в Alpina Digital.

Telegram-канал RSS hamidun.com