OpenAI's Codex Security Scanned 1.2 Million Commits, Found Over 10,000 High-Severity Vulnerabilities

OpenAI is moving beyond chatbots and into the trenches of software security. The company rolled out Codex Security last week — an AI-powered security agent that doesn't just flag potential vulnerabilities, but validates them in sandboxed environments and proposes working fixes. The results from its first month of beta testing are striking.

The Numbers

During its beta period, Codex Security scanned more than 1.2 million commits across external repositories and identified a total of 792 critical findings and 10,561 high-severity findings. These weren't theoretical — they included real vulnerabilities in widely-used open-source projects that millions of developers and systems depend on.

GnuPG: CVE-2026-24881 and CVE-2026-24882
GnuTLS: CVE-2025-32988 and CVE-2025-32989
GOGS: CVE-2025-64175 and CVE-2026-25242
OpenSSH, libssh, PHP, and Chromium were also among affected projects
Thorium: Seven separate CVEs discovered (CVE-2025-35430 through CVE-2025-35436)

How It Works

Unlike traditional static analysis tools that blast developers with hundreds of alerts (many false positives), Codex Security takes a three-stage approach designed to cut through the noise:

1. Context Building

The agent first analyzes a repository to understand the project's security-relevant structure. It generates an editable threat model that captures what the system does and where it's most exposed. This isn't a shallow scan — it builds deep understanding of the codebase architecture.

2. Vulnerability Discovery and Validation

Using the system context as a foundation, Codex Security identifies potential vulnerabilities and classifies findings based on real-world impact. Critically, flagged issues are then pressure-tested in a sandboxed environment to validate them before surfacing to developers.

"When Codex Security is configured with an environment tailored to your project, it can validate potential issues directly in the context of the running system. That deeper validation can reduce false positives even further and enable the creation of working proofs-of-concept." — OpenAI

3. Fix Proposals

The final stage involves the agent proposing patches that align with the system's behavior to reduce regressions — making fixes easier to review and deploy rather than introducing new problems.

The False Positive Problem

One of the most significant claims from OpenAI is the reduction in false positives. Traditional security scanners are notorious for alert fatigue — drowning developers in warnings that turn out to be non-issues. OpenAI says false positive rates dropped by more than 50% across all repositories during the beta period, with precision improving over repeated scans.

Codex Security represents an evolution of Aardvark, which OpenAI unveiled in private beta in October 2025. The new version is available as a research preview to ChatGPT Pro, Enterprise, Business, and Education customers through the Codex web interface, with free usage for the next month.

The Competitive Landscape

The launch comes weeks after Anthropic released Claude Code Security, its own AI-powered vulnerability scanner. The race to own the AI-assisted security toolchain is heating up, with both companies betting that autonomous code review is one of the clearest near-term use cases for AI agents in enterprise workflows.

For security teams already overwhelmed by alert volume and the expanding attack surface driven by AI-generated code itself, tools like Codex Security could represent a meaningful shift — from reactive scanning to proactive, context-aware security engineering.