OpenAI Blog→ оригинал

OpenAI details Codex safeguards: sandboxes, approvals, and audits of agent actions

OpenAI showed how it runs Codex internally without fully trusting the agent. It uses sandboxes, approvals for actions outside set boundaries, network policies,

OpenAI details Codex safeguards: sandboxes, approvals, and audits of agent actions
Source: OpenAI Blog. Коллаж: Hamidun News.
◐ Слушать статью

On May 8, 2026, OpenAI explained how it restricts Codex within its workflows. The idea is simple: a code agent should accelerate development, but not gain uncontrolled access to files, networks, and infrastructure.

Boundaries for Codex

The first layer of protection is the sandbox. It establishes technical boundaries: where Codex can write, which directories are open for writing, whether network access is permitted, and which paths remain protected. On top of this, an approval policy operates. If an agent needs to do something outside the permitted environment, it must request confirmation. A user can approve a single specific action or grant permission for that type of action for the remainder of the session. This reduces the risk of accidental and opaque operations.

Separately, OpenAI uses an Auto-review mode. When Codex is about to cross a sandbox boundary, it passes the action plan and recent context to a separate auto-approval subagent. That agent can automatically permit low-risk requests and, in some cases, even more sensitive ones if the user already has sufficient authorization level. As a result, routine tasks do not slow down work, while potentially dangerous actions are still stopped at the review stage.

"Keep an agent within clear technical boundaries, accelerate low-risk actions, and explicitly highlight risky ones" — this is how

OpenAI describes the goal of this scheme.

Network, Access, Rules

The second layer is access management. OpenAI does not give Codex free outbound internet access: the network policy permits expected destinations, blocks unwanted ones, and requires approval for unfamiliar domains. Even search and web fetch can be restricted to cached responses. CLI and MCP OAuth credentials are stored in the system keyring, login is mandatory through ChatGPT, and access is tied to a specific enterprise workspace. This ensures that agent activity remains within corporate control boundaries.

  • Permitted sandbox modes — read-only and workspace-write only
  • Writing is automatically enabled only to pre-known work directories
  • For network, you can enable a proxy, allow localhost, and block specific domains
  • Known addresses like *.openai.com can pass without manual approval
  • Administrative requirements cannot be disabled on the user side

There are also separate rules for shell commands. Codex does not consider all commands equally safe: standard read-only operations can be skipped without confirmation even outside the sandbox, while dangerous patterns are blocked or sent for review. In OpenAI's example, they permit without extra questions reading pull requests through gh pr view and list, as well as Kubernetes diagnostics through kubectl get, describe, and logs. The company rolls out the baseline policy through managed configs and requirements, so the same control framework operates in the desktop app, CLI, and IDE extension.

Logs with Explanation

OpenAI separately emphasizes that restrictions alone are insufficient. Standard security logs show well what happened: a process started, a file changed, there was an attempt at network connection. But from them it is difficult to reconstruct the intention of the agent and the user. Therefore, Codex can export OpenTelemetry logs with richer agent context: original user requests, approval decisions, results of tool calls, use of MCP servers, and network proxy events — what was permitted and what was blocked. For Enterprise and Edu customers, this activity is also available through the Compliance Logs Platform, and the logs themselves can be centralized in SIEM and compliance systems.

Within OpenAI, these data are used by an AI agent for security triage. When endpoint protection reports strange Codex behavior, the security team looks not only at the alert itself, but at accompanying logs: what the original request was, which tools were launched, what the agent tried to do, what decisions the network made, and where approval was required. This helps more quickly distinguish expected behavior from a harmless mistake and from a case that truly requires escalation. The same logs are also used operationally — to understand how internal Codex usage is growing, which tools are in demand, and where the policy still needs tweaking.

What This Means

OpenAI is essentially demonstrating a corporate template for code agents: not a smart model instead of controls, but a model within a tightly configured environment with auditing and rules. For companies looking to deploy such assistants, this is a signal that the main product question now is not only about model quality, but also about how well it fits into security, compliance, and internal processes.

ЖХ
Hamidun News
AI‑новости без шума. Ежедневный редакторский отбор из 400+ источников. Продукт Жемала Хамидуна, Head of AI в Alpina Digital.
What do you think?
Loading comments…