Two AI Cybersecurity Giants, Nearly Identical Scores — and the Same Three Partners: The Real Battle Is Over WhoAI-generated image for AI Universe News

Two AI Cybersecurity Giants, Nearly Identical Scores — and the Same Three Partners: The Real Battle Is Over Who Controls Access

When OpenAI launched Daybreak and Anthropic introduced Project Glasswing six weeks earlier, the headline numbers told a story of near-parity: GPT-5.5 achieved a 71.4% average pass rate on Expert-level capture-the-flag (CTF) tasks — structured hacking challenges used to measure offensive security skill — while Claude Mythos Preview scored 68.6% on the identical suite. That 2.8-point gap sits within any reasonable margin of error, meaning the two most prominent AI cybersecurity initiatives on the market are, by the most rigorous available measure, functionally equivalent in raw capability.

What separates them, then, is not what the models can do — it is who gets to use them, under what conditions, and through which commercial envelope. Cisco, CrowdStrike, and Palo Alto Networks signed on as launch partners for both Daybreak and Glasswing, a detail that signals less about loyalty and more about a deliberate dual-stack hedge by the security industry’s largest vendors. The real competition is not benchmark points; it is the architecture of access.

Both initiatives share the same stated mission: find vulnerabilities, validate exploits, and help defenders patch systems before attackers reach them. But the mechanisms each company built around that mission — Daybreak’s tiered trust framework versus Glasswing’s narrower consortium model — reveal two distinct theories about how AI capability gets monetized in enterprise security.

Benchmarks That Converge — and a Simulation That Diverges

The UK AI Security Institute (AISI) evaluated GPT-5.5 against the same 95-task capture-the-flag suite it used on Mythos Preview, according to AISI’s primary documentation. The results place both models in a tier that their predecessors could not reach: GPT-5.4 managed only a 52.4% pass rate on Expert-tier tasks, and Opus 4.7 reached 48.6%. The jump from the 48–52% range to the 68–71% range is not incremental — it represents a qualitative shift in what these systems can autonomously accomplish against hardened security challenges.

AISI also ran “The Last Ones,” a 32-step simulated corporate network attack that a human expert needs around 20 hours to complete. Here the numbers diverge slightly: Mythos Preview finished the simulation three times in ten attempts, while GPT-5.5 finished it two times in ten. Neither model is reliable enough to be treated as an autonomous attacker — but both are capable enough to function as a force multiplier for a skilled human operator. AISI frames GPT-5.5’s results as evidence that cyber capability is emerging as a by-product of general autonomy and coding improvements, not as a purpose-built offensive feature — a distinction that matters for how regulators will eventually classify these systems.

Codex Security, the code-aware agent bundled with Daybreak, adds tooling around repository scoping, patch generation, and CI (continuous integration) pipeline integration — meaning it can not only find a vulnerability but propose and stage a fix within an existing development workflow. Mythos Preview is exposed through standard Anthropic surfaces, without a dedicated agent harness of equivalent specificity.

InitiativeKey DifferentiatorBest For
Daybreak (GPT-5.5 + Codex Security)Tiered trust framework, audit trail, verification workflow, GPT-5.5-Cyber explicitly tiered for offensive workflowsBroad rollout across many security teams; red teaming and penetration testing at scale
Glasswing (Claude Mythos Preview)Consortium model, credit commitment, disclosure protocols, open-source maintainer access layer; $4M earmarked for open-source security donationsDeep integration with a small set of critical systems; hands-on Anthropic engagement via Linux Foundation partnership
Open-weight contendersReplicable flagship vulnerability analysis at a fraction of the costCost-sensitive teams willing to manage their own infrastructure and accept lower vendor support

The Walled Garden Strategy: Access as the Real Product

The fact that Cisco, CrowdStrike, and Palo Alto Networks are launch partners for both Daybreak and Glasswing is not a coincidence — it is a calculated hedge. These vendors are not betting on one model winning; they are ensuring they have a seat at both tables while the access architectures solidify. Daybreak runs through the OpenAI API with tiered access to GPT-5.5 variants, scaling to more verified defenders. Glasswing operates as a narrower consortium, with Mythos available on Bedrock, Vertex, and Foundry — three separate cloud surfaces — while Anthropic maintains hands-on engagement with consortium members.

This dual-stack reality forces security teams to evaluate not just model performance but platform dependency. Glasswing’s open-source maintainer access layer, backed by the Linux Foundation as a launch partner and $4M earmarked for open-source security donations, signals a deliberate attempt to embed Anthropic’s model into the infrastructure layer of the software supply chain. Daybreak counters with GPT-5.5-Cyber, explicitly tiered for offensive workflows including red teaming and penetration testing — a commercial framing that Glasswing also supports for vetted penetration-testing work, but without the same explicit product tier structure.

The pattern is familiar. Cursor and Replit went through a similar dynamic with coding agents — early parity in capability, followed by differentiation through ecosystem lock-in and workflow integration. Google is already running Mythos Preview through Vertex AI and shipping its own Security Operations agents, adding a third commercial envelope around the same underlying model. Meanwhile, open-weight contenders can replicate flagship vulnerability analysis at a fraction of the cost, which means the tiered trust frameworks and partner rosters are not just commercial strategy — they are the primary moat once model capability converges.

📊 Key Numbers

  • GPT-5.5 Expert CTF pass rate: 71.4% average on the AISI 95-task capture-the-flag suite
  • Mythos Preview Expert CTF pass rate: 68.6% on the identical 95-task suite — a 2.8-point gap within margin of error
  • GPT-5.4 Expert CTF pass rate: 52.4% — 19 points below GPT-5.5, establishing the generational gap
  • Opus 4.7 Expert CTF pass rate: 48.6% — the prior Anthropic baseline, now surpassed by 20 points
  • “The Last Ones” completion — Mythos Preview: 3 of 10 attempts on the 32-step, ~20-hour corporate network simulation
  • “The Last Ones” completion — GPT-5.5: 2 of 10 attempts on the same simulation
  • Glasswing open-source funding: $4M earmarked for open-source security donations via Linux Foundation partnership
  • Shared launch partners: 3 — Cisco, CrowdStrike, and Palo Alto Networks appear on both Daybreak and Glasswing rosters

🔍 Context

The UK AI Security Institute (AISI) — the government body responsible for evaluating frontier AI risk in the United Kingdom — conducted the benchmark evaluations that underpin both initiatives’ capability claims, using the same 95-task CTF suite across both models, which allows direct comparison without vendor-controlled testing conditions. The specific gap this addresses is the absence of a verified, government-audited baseline for AI-assisted offensive security tools entering enterprise markets: before AISI’s evaluations, vendors could make capability claims without an independent reference point. Both Daybreak and Glasswing arrive as the security industry confronts a generation shift — GPT-5.4 at 52.4% and Opus 4.7 at 48.6% were capable but not reliably useful for Expert-tier tasks; the jump to the 68–71% range changes the calculus for security operations centers considering AI augmentation. The competitive pressure is triangular: OpenAI and Anthropic are competing directly, but Google is already running Mythos Preview through Vertex AI and shipping its own Security Operations agents, meaning Anthropic’s model is simultaneously a Glasswing asset and a Google product — a structural tension that Daybreak does not face. The timing of Glasswing’s launch six weeks before Daybreak, combined with the identical partner roster, suggests both companies were aware of each other’s roadmaps and accelerated to establish access architecture before capability differentiation became impossible.

💡 AIUniverse Analysis

Our reading: The genuine advance here is not the benchmark scores — it is the AISI-verified confirmation that the 52%-to-71% jump in Expert CTF performance reflects a structural change in how these models handle multi-step reasoning under adversarial conditions, not just prompt engineering improvements. AISI’s framing — that cyber capability is emerging as a by-product of general autonomy and coding improvements — means security teams are not waiting for purpose-built offensive AI; it is arriving as a side effect of models getting better at everything else. That changes the procurement conversation from “is this tool ready” to “how do we govern what is already capable.”

The shadow is the access architecture itself. Both Daybreak and Glasswing are designed to keep value inside their respective commercial envelopes — tiered trust frameworks, consortium models, and partner rosters are not neutral infrastructure choices; they are revenue and control mechanisms. Security teams that adopt either platform are not just choosing a model; they are choosing a vendor relationship that will shape which vulnerabilities get prioritized, which disclosure protocols apply, and which audit trails are available to regulators. The fact that open-weight contenders can replicate flagship vulnerability analysis at a fraction of the cost means the moat is entirely in the access layer — and access layers have historically been where vendor lock-in compounds quietly until switching costs become prohibitive.

For this to matter in 12 months, one of two things must be true: either the tiered trust frameworks produce measurably better security outcomes than unstructured model access — which requires public, auditable evidence — or the consortium and partner models consolidate enough of the enterprise security market that the access architecture becomes the de facto standard, regardless of whether it is technically superior.

⚖️ AIUniverse Verdict

👀 Watch this space. AISI’s independent evaluation confirms genuine capability at the Expert CTF tier, but with Mythos finishing “The Last Ones” three times in ten attempts and GPT-5.5 finishing it two times in ten, neither model is reliable enough to anchor a security program — and the real test is whether the access architectures around them produce better outcomes than the models alone.

🎯 What This Means For You

Founders & Startups: The identical partner roster across Daybreak and Glasswing signals that large security vendors are hedging, not committing — startups building on either platform should assume the access architecture, not the model score, is what will determine long-term viability.

Developers: Codex Security’s repository scoping, patch generation, and CI integration tooling within Daybreak means the workflow integration question is already answered for OpenAI’s stack; Glasswing’s open-source maintainer access layer via the Linux Foundation offers a different entry point for developers embedded in open-source infrastructure.

Enterprise & Mid-Market: The practical recommendation splits cleanly: Daybreak’s tiered access scales to broad security team rollout and explicit red-team workflows; Glasswing’s consortium model suits deep integration with a small set of critical systems where hands-on Anthropic engagement and disclosure protocols matter more than scale.

General Users: The 32-step “The Last Ones” simulation — requiring around 20 hours for a human expert — being partially completable by both models means the tools that protect enterprise infrastructure are being stress-tested against AI-assisted attacks; the governance frameworks being built now will determine whether that capability is used defensively or exploited offensively.

⚡ TL;DR

  • What happened: OpenAI’s Daybreak (GPT-5.5, 71.4% Expert CTF) and Anthropic’s Glasswing (Mythos Preview, 68.6%) launched as near-identical cybersecurity initiatives with the same three major security partners, independently evaluated by the UK AI Security Institute.
  • Why it matters: Benchmark parity means the competition has shifted entirely to access architecture — tiered trust frameworks versus consortium models — and security teams adopting either platform are choosing a vendor relationship, not just a model.
  • What to do: Map your security team’s scale and integration depth before committing: Daybreak for broad, tiered rollout and offensive workflows; Glasswing for deep, consortium-level integration with critical systems — and watch whether open-weight alternatives close the capability gap before either access architecture locks in.

📖 Key Terms

Capture-the-flag suite (CTF)
A structured set of cybersecurity challenges — in this context, AISI’s 95-task Expert-level battery — used to measure how well an AI model can autonomously identify and exploit vulnerabilities under controlled conditions.
Tiered trust framework
Daybreak’s access control model, in which different levels of verified users receive different degrees of access to GPT-5.5 variants, with audit trails and verification workflows governing who can use offensive-capable features.
Consortium
Glasswing’s partner model, in which a defined group of organizations — including the Linux Foundation and named security vendors — receive structured access, credit commitments, and disclosure protocols rather than open-market tiered pricing.
Agent harness
The scaffolding around a language model that gives it tools, memory, and workflow integration — Codex Security functions as Daybreak’s agent harness, adding repository scoping, patch generation, and CI pipeline capabilities to GPT-5.5.

📎 Sources

Sources: The New Stack

Analysis based on reporting by The New Stack. Original article here.

By AI Universe

AI Universe