On February 5, 2026, at approximately 6:40 PM ET, Anthropic released Claude Opus 4.6. Twenty minutes later, OpenAI dropped GPT-5.3-Codex. Within 48 hours, $285 billion had been wiped from global software stocks. Three days later, both companies aired competing Super Bowl ads.

This wasn't just a product launch. It was the opening salvo of the AI coding wars and the fallout is still reshaping the tech industry.

We've spent the past three days testing both models, reading the system cards, verifying benchmarks, and separating the marketing from reality. Here's what you actually need to know.

The 20-Minute War

The timing wasn't coincidental. Anthropic went first with Opus 4.6, and OpenAI responded almost instantly with GPT-5.3-Codex, a move VentureBeat described as a deliberate "counter-punch" amid an already heated week between the two companies.

Both models represent massive leaps in AI coding capability. Both claim to be the best. And both come with unprecedented implications for cybersecurity, software development, and the future of white-collar work.

Let's break down what each model actually brings to the table.

Claude Opus 4.6: The Highlights

Opus 4.6 is Anthropic's most capable model to date, and the improvements over Opus 4.5 are substantial:

  • 1 million token context window (beta) — 5x larger than its predecessor's 200K limit
  • 128K token max output — doubled from Opus 4.5's 64K
  • Agent Teams — multiple Claude Code instances coordinate work in parallel, with a lead session assigning tasks and synthesizing results
  • Adaptive Thinking — four effort levels (low, medium, high, max) replace the old binary extended thinking toggle, letting the model dynamically decide how deeply to reason
  • Context Compaction API — automatic server-side summarization for effectively infinite conversations
  • Fast Mode — 2.5x faster output at 6x the price ($30/$150 per million tokens), with a 50% introductory discount through February 16

But the headline that grabbed everyone's attention was this: Opus 4.6 autonomously discovered 500+ previously unknown zero-day vulnerabilities in open-source software using only out-of-the-box capabilities. Each was validated by Anthropic researchers or independent security experts. For one GhostScript flaw, Claude turned to the project's Git commit history after both fuzzing and manual analysis failed, demonstrating problem-solving approaches that surprised even Anthropic's own team.

GPT-5.3-Codex: The Highlights

OpenAI's response was equally ambitious:

  • 400K token context window with a "Perfect Recall" attention mechanism that prevents information loss in the middle of long contexts
  • 128K token max output — matching Opus 4.6
  • Auto-Router architecture — automatically switches between a fast "Reflex Mode" for simple queries and "Deep Reasoning Mode" for complex problems
  • Interactive Steering — users can adjust, redirect, and provide feedback mid-task without losing context
  • Deep Diffs — shows why a code change was made, not just what changed
  • 25% faster inference than GPT-5.2-Codex, using fewer output tokens to achieve equivalent results

The controversial headline: GPT-5.3-Codex "helped build itself." OpenAI says early versions of the model debugged its own training runs, managed deployment infrastructure, and diagnosed test results. However — and this is important — OpenAI's own system card explicitly states the model "does not reach High capability on AI self-improvement." The reality sits between the marketing and the safety assessment: it was a useful tool during development, but humans remained in charge.

What the system card does confirm is more concerning: GPT-5.3-Codex is the first OpenAI model to receive a "High" cybersecurity designation under their Preparedness Framework, meaning it could potentially automate end-to-end cyber operations against hardened targets. OpenAI is taking a precautionary approach, gating full API access and committing $10 million in API credits for cyber defense.

Head-to-Head Benchmarks

Here's where it gets interesting. Important caveat: Anthropic and OpenAI report on different SWE-Bench variants (Verified vs. Pro), making direct comparison on that specific benchmark unreliable. We've included all available scores from official sources.

Benchmark Opus 4.6 GPT-5.3-Codex Winner
Terminal-Bench 2.0 65.4% 77.3% Codex (+11.9)
GPQA Diamond 91.3% 73.8% Opus (+17.5)
ARC AGI 2 68.8% Opus
OSWorld 72.7% 64.7% Opus (+8.0)
BrowseComp 84.0% Opus
Cybersecurity CTF 77.6% Codex
SWE-Lancer IC Diamond 81.4% Codex
MMLU Pro 85.1% 82.9% Opus (+2.2)
SWE-Bench Verified 80.8% Not reported*
SWE-Bench Pro Not reported* 56.8%

*SWE-Bench Verified and SWE-Bench Pro are different benchmark variants with different methodologies and difficulty levels. Direct numerical comparison between them is not valid.

The pattern is clear: Codex dominates terminal-based automation and sustained coding tasks. Opus leads in reasoning, knowledge, browsing, and abstract problem-solving. Every.to's independent "LFG" benchmark — a practical test involving React, 3D visualization, and a full e-commerce build, scored Opus 9.25/10 versus Codex 7.5/10, with the gap widening on complex, under-specified requirements.

Opus 4.6's Standout Result

ARC AGI 2 score jumped from 37.6% (Opus 4.5) to 68.8% — a 31.2 percentage point increase that nearly doubles the previous version. This is the largest single-benchmark improvement seen in any frontier model update and suggests genuine advances in abstract reasoning, not just benchmark optimization.

Pricing: Where They Stand

  Claude Opus 4.6 GPT-5.3-Codex
API Input $5 / MTok Not yet announced
API Output $25 / MTok Not yet announced
Context Window 1M tokens (beta) 400K tokens
Max Output 128K tokens 128K tokens
Batch Discount 50% ($2.50 / $12.50) TBD
Cache Savings Up to 90% TBD
Subscription Claude Pro $20/mo ChatGPT Plus $20/mo

As of February 8, OpenAI has not released official API pricing for GPT-5.3-Codex. The predecessor GPT-5.2-Codex was priced at $1.75 input / $14 output per million tokens — significantly cheaper than Opus 4.6. If OpenAI maintains similar pricing, Codex will likely be the more affordable option for pure coding tasks.

Worth noting: Opus 4.6 is priced identically to Opus 4.5 ($5/$25), meaning all the improvements, including the 5x context window expansion; come at zero additional cost over the previous version.

The $285 Billion "SaaSpocalypse"

The simultaneous release of both models, along with Anthropic's industry-specific Claude Cowork plugins and OpenAI's Frontier agent platform, triggered a massive selloff across global software stocks. Jeffrey Favuzza at Jefferies' equity trading desk coined the term "SaaSpocalypse" to describe what he called "very much 'get me out' style selling."

The fear? AI agents could render the SaaS subscription model obsolete. If an AI can do the work that software used to do, why pay per seat?

Company Ticker YTD Drop
Figma FIGM -40%
HubSpot HUBS -39%
Shopify SHOP -38%
Atlassian TEAM -35%
Intuit INTU -34%
Salesforce CRM -26%
LegalZoom LZ -20%
Thomson Reuters TRI -18% (single day)

The iShares Software ETF (IGV) dropped over 20%. The JPMorgan US Software Index fell 7% in a single trading day. Even advertising giants like WPP (-12%), Omnicom (-11%), and Publicis (-9%) were caught in the downdraft.

The Super Bowl Ad War

As if the stock market carnage wasn't dramatic enough, both companies aired competing ads during Super Bowl LX on February 9.

Anthropic's campaign was a direct shot at OpenAI, with the tagline: "Ads are coming to AI. But not to Claude." One 60-second pregame spot showed a man asking a chatbot for advice on communicating with his mom, only for the response to morph into an ad for a fictional cougar-dating site called "Golden Encounters." The message was clear: AI that serves ads can't serve you.

OpenAI's 60-second Codex commercial took a different approach, positioning AI coding as part of a long lineage of human creation and building.

Sam Altman didn't take the jab quietly. He called Anthropic's ads "funny" but "clearly dishonest," arguing that OpenAI "would obviously never run ads in the way Anthropic depicts them." He went further, calling Anthropic "authoritarian" and claiming they block Claude Code usage from "companies they don't like" — including OpenAI. His closing shot: "One authoritarian company won't get us there on their own... It is a dark path."

The Honest Assessment: Which Should You Use?

After testing both models and reviewing the data, here's our take:

Choose Claude Opus 4.6 if you need:

  • Deep reasoning on complex, ambiguous problems
  • Multi-agent workflows (Agent Teams)
  • Security auditing and vulnerability research
  • Long-context tasks (1M tokens vs 400K)
  • Higher ceiling on open-ended creative coding

Choose GPT-5.3-Codex if you need:

  • Fast, reliable terminal automation
  • Consistent output with fewer errors and lower variance
  • Mid-task steering and redirection
  • Sustained autonomous coding sessions
  • Budget-friendly API pricing (once announced)

As Every.to put it in their "Great Convergence" review: "Opus has a higher ceiling but more variance; Codex is more reliable with fewer errors." Neither model dominates universally — and increasingly, teams are mixing and matching based on the task.

The Writing Quality Caveat

One criticism worth noting: multiple users on Reddit have reported that Opus 4.6's writing quality regressed compared to Opus 4.5. Prose is described as "flatter, more generic" with more formulaic constructions. The theory is that reinforcement learning optimizations for reasoning came at the cost of natural prose quality. The community consensus: use 4.6 for coding, stick with 4.5 for writing. Anthropic has not formally addressed this.

What's Coming Next

The AI coding race isn't slowing down. Here's what we're watching:

  • DeepSeek V4 (expected ~February 17) — focuses on repo-level reasoning and a novel "Engram" memory system for near-infinite context retrieval. Expected to be open-weight.
  • Gemini 3 Pro — Google's model already leads LMArena's text leaderboard for user preference and matches Opus 4.6 on GPQA Diamond at 91.9%
  • Claude Sonnet 5 — Anthropic's next mid-tier model, expected with gains in coding and reasoning

February 2026 might be remembered as the month the AI coding wars went fully mainstream — complete with stock market crashes, Super Bowl ads, and models that help build themselves. The question isn't which model wins anymore. It's whether your workflow is ready for what's next.


Sources