OpenAI Blog→ original

OpenAI launches Safety Bug Bounty for vulnerabilities in agentic AI systems

OpenAI has launched Safety Bug Bounty, a rewards program for security researchers who find vulnerabilities specific to AI systems. It focuses on three…

AI-processed from OpenAI Blog; edited by Hamidun News
OpenAI launches Safety Bug Bounty for vulnerabilities in agentic AI systems
Source: OpenAI Blog. Collage: Hamidun News.
◐ Listen to article

OpenAI has announced a Safety Bug Bounty program — a specialized track within its existing vulnerability rewards system, focused not on classical software bugs, but on risks unique to AI systems. Security researchers who discover and document vulnerabilities in the company's products will be able to receive monetary rewards through the Bugcrowd platform. The principal distinction of Safety Bug Bounty from standard bug bounty programs lies in the subject of the search.

Traditional bug bounty hunts for SQL injections, authentication vulnerabilities, or server infrastructure issues. The new program focuses on three vectors specific to language models: AI capability misuse (bypassing safety filters, using the model for prohibited tasks), prompt injection, and data leakage from conversation context or system instructions. Particular attention is drawn to the focus on agent vulnerabilities.

Over the past eighteen months, OpenAI has actively rolled out agent products — Operator, Deep Research, Responses API with tools for browsing and file operations. An agent that independently visits websites, performs searches, and executes code has a fundamentally larger attack surface than a chatbot. A specially crafted webpage or document can contain hidden instructions that the model will perceive as commands from the user — and execute them.

This class of attacks is called indirect prompt injection. The essence: the attacker does not address the model directly, but embeds malicious instructions in content that the agent processes as data. For example, visiting a compromised website could cause the agent to send an email on behalf of the user, copy confidential data, or modify settings of connected services.

The attack works precisely because many models do not distinguish between trusted system instructions and untrusted external content. The problem of data leakage in the LLM context also requires specific testing methods. This is not about server breaches, but about situations where the model unintentionally reveals the contents of the system prompt, reproduces data from other users through memory mechanisms, or allows reconstruction of fragments of the training dataset through targeted queries.

Traditional penetration testing tools are not suited for such tasks — specialized expertise is needed. By creating a separate track with its own evaluation and payout rules, OpenAI de facto recognizes that AI-specific threats require a separate methodology. This aligns with the position of leading laboratories: Anthropic regularly conducts red-teaming before releasing new models, Google DeepMind publishes research on agent system security, and regulators in the US and EU are beginning to require proof of systematic testing.

The practical significance of the program lies in scaling. Internal security teams are limited in number and prone to blind spots. The external community of researchers is capable of discovering attack vectors that insiders missed, especially with non-standard inputs.

For users of agent products, this means more systematic testing of systems to which they grant access to their browsers, files, and accounts.

ZK
Hamidun News
AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

What do you think?
Loading comments…