OpenAI is devoting its core resources to building a fully autonomous AI researcher
OpenAI is shifting priorities and betting on a fully autonomous AI researcher. The first stage is an “AI intern” by September 2026: a system that can be…
AI-processed from MIT Technology Review; edited by Hamidun News
OpenAI is shifting its research focus and betting not on another chatbot, but on an autonomous AI researcher. The company wants to develop a system that can work through complex problems for weeks with minimal human intervention and eventually become a full-fledged 'laboratory in a data center.'
New Priority
According to OpenAI's Chief Research Officer Jacob Pachocki, the project is becoming a shared company-wide goal around which reasoning models, agent systems, and interpretability tools should converge. The next milestone is to demonstrate an 'autonomous AI intern' by September 2026: a system that can be assigned a limited set of research tasks for several days of work. The next phase is planned for 2028: a fully automated multi-agent system capable of tackling problems too large or complex for a single team of people.
For OpenAI, this is not merely an elegant research direction. The company is no longer the sole pace-setter in the market as it was in the early GPT days, and now competes with Anthropic and Google DeepMind not only on model quality but also on what the product stack around them becomes.
The bet on an AI researcher looks like an attempt to go beyond a chatting and coding assistant and create the next layer of automation: a system that doesn't just answer queries but conducts extended work from problem formulation to intermediate results.
"Essentially, you'll have an entire research laboratory inside a data center," says
Pachocki.
What's Already Ready
OpenAI believes that an early prototype of this approach is already visible in Codex. This agent tool can write and execute code, analyze documents, create visualizations, compile summaries, and handle long technical chains that humans previously had to work through manually.
Pachocki explicitly states that the company wants to take what has already succeeded in programming and transfer this mode of operation to a broader class of problems. The logic is straightforward: if an agent can independently handle substantial portions of engineering work, the same approach can be extended to science and applied research.
OpenAI expects the future system to have not a single 'magical' function but a set of research skills that can be chained together:
- Parse large volumes of text, code, and notes
- Propose hypotheses and verification approaches
- Break large problems into subtasks and manage them in parallel
- Find solutions in mathematics, physics, biology, chemistry, and government policy scenarios
According to OpenAI, the technical foundation consists of two directions. First, general-purpose models have become better at maintaining focus and context over longer periods: the transition from GPT-3 to GPT-4 already demonstrated notable growth in 'endurance' on a single task.
Second, reasoning models that work through problems step-by-step and can backtrack after an error are better suited for long autonomous cycles. Additionally, models are specially trained on complex mathematical and coding tasks to learn how to maintain extended context and manage multiple subtasks simultaneously.
Where the Weaknesses Are
The most obvious question is not ambition, but reliability. Independent researchers acknowledge that the idea of an autonomous AI scientist is a logical continuation of the success of programming agents, but they warn: once a system must execute not just one step but a long chain of actions, the probability of error at each stage begins to accumulate.
In tests by the Allen Institute for AI, strong models already demonstrated good performance on scientific tasks, yet still made regular errors. Even if newer versions have improved, the problem doesn't disappear: proposing a strong idea is one thing; consistently bringing a multi-day investigation to a correct conclusion is another.
There is also a harder layer of risks. Pachocki says that OpenAI is betting on chain-of-thought monitoring: the model leaves a kind of working notes as it solves, while other models or researchers watch to ensure it doesn't drift into dangerous or simply incorrect behavior.
But he himself admits that complete control is not yet in place. An autonomous system can misinterpret instructions, exceed task boundaries, become a target for attack, or gain access to overly powerful tools.
Therefore, truly powerful versions of such agents, in his view, will need to be kept in isolated sandbox environments and restricted until oversight mechanisms become significantly more reliable.
What This Means
If OpenAI even partially executes this plan, the next leap in AI will be tied not to more conversational chatbots, but to systems that can be assigned portions of real intellectual work for days or weeks.
But this is precisely where the real test lies: the market readily believes in rapid agent progress, while science and safety will demand not demo effects, but reproducible results under human oversight.
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.