Claude Opus 4.6 vs GPT-5.3-Codex: The AI Coding War That Crashed the Stock Market

By Jared Rice February 9, 2026 8 min read 436 views 0 comments

On February 5, 2026, at approximately 6:40 PM ET, Anthropic released Claude Opus 4.6. Twenty minutes later, OpenAI dropped GPT-5.3-Codex. Within 48 hours, $285 billion had been wiped from global software stocks. Three days later, both companies aired competing Super Bowl ads.

This wasn't just a product launch. It was the opening salvo of the AI coding wars and the fallout is still reshaping the tech industry.

We've spent the past three days testing both models, reading the system cards, verifying benchmarks, and separating the marketing from reality. Here's what you actually need to know.

The 20-Minute War

The timing wasn't coincidental. Anthropic went first with Opus 4.6, and OpenAI responded almost instantly with GPT-5.3-Codex, a move VentureBeat described as a deliberate "counter-punch" amid an already heated week between the two companies.

Both models represent massive leaps in AI coding capability. Both claim to be the best. And both come with unprecedented implications for cybersecurity, software development, and the future of white-collar work.

Let's break down what each model actually brings to the table.

Claude Opus 4.6: The Highlights

Opus 4.6 is Anthropic's most capable model to date, and the improvements over Opus 4.5 are substantial:

1 million token context window (beta) — 5x larger than its predecessor's 200K limit
128K token max output — doubled from Opus 4.5's 64K
Agent Teams — multiple Claude Code instances coordinate work in parallel, with a lead session assigning tasks and synthesizing results
Adaptive Thinking — four effort levels (low, medium, high, max) replace the old binary extended thinking toggle, letting the model dynamically decide how deeply to reason
Context Compaction API — automatic server-side summarization for effectively infinite conversations
Fast Mode — 2.5x faster output at 6x the price ($30/$150 per million tokens), with a 50% introductory discount through February 16

But the headline that grabbed everyone's attention was this: Opus 4.6 autonomously discovered 500+ previously unknown zero-day vulnerabilities in open-source software using only out-of-the-box capabilities. Each was validated by Anthropic researchers or independent security experts. For one GhostScript flaw, Claude turned to the project's Git commit history after both fuzzing and manual analysis failed, demonstrating problem-solving approaches that surprised even Anthropic's own team.

GPT-5.3-Codex: The Highlights

OpenAI's response was equally ambitious:

400K token context window with a "Perfect Recall" attention mechanism that prevents information loss in the middle of long contexts
128K token max output — matching Opus 4.6
Auto-Router architecture — automatically switches between a fast "Reflex Mode" for simple queries and "Deep Reasoning Mode" for complex problems
Interactive Steering — users can adjust, redirect, and provide feedback mid-task without losing context
Deep Diffs — shows why a code change was made, not just what changed
25% faster inference than GPT-5.2-Codex, using fewer output tokens to achieve equivalent results

The controversial headline: GPT-5.3-Codex "helped build itself." OpenAI says early versions of the model debugged its own training runs, managed deployment infrastructure, and diagnosed test results. However — and this is important — OpenAI's own system card explicitly states the model "does not reach High capability on AI self-improvement." The reality sits between the marketing and the safety assessment: it was a useful tool during development, but humans remained in charge.

What the system card does confirm is more concerning: GPT-5.3-Codex is the first OpenAI model to receive a "High" cybersecurity designation under their Preparedness Framework, meaning it could potentially automate end-to-end cyber operations against hardened targets. OpenAI is taking a precautionary approach, gating full API access and committing $10 million in API credits for cyber defense.

Head-to-Head Benchmarks

Here's where it gets interesting. Important caveat: Anthropic and OpenAI report on different SWE-Bench variants (Verified vs. Pro), making direct comparison on that specific benchmark unreliable. We've included all available scores from official sources.

Benchmark	Opus 4.6	GPT-5.3-Codex	Winner
Terminal-Bench 2.0	65.4%	77.3%	Codex (+11.9)
GPQA Diamond	91.3%	73.8%	Opus (+17.5)
ARC AGI 2	68.8%	—	Opus
OSWorld	72.7%	64.7%	Opus (+8.0)
BrowseComp	84.0%	—	Opus
Cybersecurity CTF	—	77.6%	Codex
SWE-Lancer IC Diamond	—	81.4%	Codex
MMLU Pro	85.1%	82.9%	Opus (+2.2)
SWE-Bench Verified	80.8%	Not reported*	—
SWE-Bench Pro	Not reported*	56.8%	—

*SWE-Bench Verified and SWE-Bench Pro are different benchmark variants with different methodologies and difficulty levels. Direct numerical comparison between them is not valid.

The pattern is clear: Codex dominates terminal-based automation and sustained coding tasks. Opus leads in reasoning, knowledge, browsing, and abstract problem-solving. Every.to's independent "LFG" benchmark — a practical test involving React, 3D visualization, and a full e-commerce build, scored Opus 9.25/10 versus Codex 7.5/10, with the gap widening on complex, under-specified requirements.

Opus 4.6's Standout Result

ARC AGI 2 score jumped from 37.6% (Opus 4.5) to 68.8% — a 31.2 percentage point increase that nearly doubles the previous version. This is the largest single-benchmark improvement seen in any frontier model update and suggests genuine advances in abstract reasoning, not just benchmark optimization.

Pricing: Where They Stand

	Claude Opus 4.6	GPT-5.3-Codex
API Input	$5 / MTok	Not yet announced
API Output	$25 / MTok	Not yet announced
Context Window	1M tokens (beta)	400K tokens
Max Output	128K tokens	128K tokens
Batch Discount	50% ($2.50 / $12.50)	TBD
Cache Savings	Up to 90%	TBD
Subscription	Claude Pro $20/mo	ChatGPT Plus $20/mo

As of February 8, OpenAI has not released official API pricing for GPT-5.3-Codex. The predecessor GPT-5.2-Codex was priced at $1.75 input / $14 output per million tokens — significantly cheaper than Opus 4.6. If OpenAI maintains similar pricing, Codex will likely be the more affordable option for pure coding tasks.

Worth noting: Opus 4.6 is priced identically to Opus 4.5 ($5/$25), meaning all the improvements, including the 5x context window expansion; come at zero additional cost over the previous version.

The $285 Billion "SaaSpocalypse"

The simultaneous release of both models, along with Anthropic's industry-specific Claude Cowork plugins and OpenAI's Frontier agent platform, triggered a massive selloff across global software stocks. Jeffrey Favuzza at Jefferies' equity trading desk coined the term "SaaSpocalypse" to describe what he called "very much 'get me out' style selling."

The fear? AI agents could render the SaaS subscription model obsolete. If an AI can do the work that software used to do, why pay per seat?

Company	Ticker	YTD Drop
Figma	FIGM	-40%
HubSpot	HUBS	-39%
Shopify	SHOP	-38%
Atlassian	TEAM	-35%
Intuit	INTU	-34%
Salesforce	CRM	-26%
LegalZoom	LZ	-20%
Thomson Reuters	TRI	-18% (single day)

The iShares Software ETF (IGV) dropped over 20%. The JPMorgan US Software Index fell 7% in a single trading day. Even advertising giants like WPP (-12%), Omnicom (-11%), and Publicis (-9%) were caught in the downdraft.

The Super Bowl Ad War

As if the stock market carnage wasn't dramatic enough, both companies aired competing ads during Super Bowl LX on February 9.

Anthropic's campaign was a direct shot at OpenAI, with the tagline: "Ads are coming to AI. But not to Claude." One 60-second pregame spot showed a man asking a chatbot for advice on communicating with his mom, only for the response to morph into an ad for a fictional cougar-dating site called "Golden Encounters." The message was clear: AI that serves ads can't serve you.

OpenAI's 60-second Codex commercial took a different approach, positioning AI coding as part of a long lineage of human creation and building.

Sam Altman didn't take the jab quietly. He called Anthropic's ads "funny" but "clearly dishonest," arguing that OpenAI "would obviously never run ads in the way Anthropic depicts them." He went further, calling Anthropic "authoritarian" and claiming they block Claude Code usage from "companies they don't like" — including OpenAI. His closing shot: "One authoritarian company won't get us there on their own... It is a dark path."

The Honest Assessment: Which Should You Use?

After testing both models and reviewing the data, here's our take:

Choose Claude Opus 4.6 if you need:

Deep reasoning on complex, ambiguous problems
Multi-agent workflows (Agent Teams)
Security auditing and vulnerability research
Long-context tasks (1M tokens vs 400K)
Higher ceiling on open-ended creative coding

Choose GPT-5.3-Codex if you need:

Fast, reliable terminal automation
Consistent output with fewer errors and lower variance
Mid-task steering and redirection
Sustained autonomous coding sessions
Budget-friendly API pricing (once announced)

As Every.to put it in their "Great Convergence" review: "Opus has a higher ceiling but more variance; Codex is more reliable with fewer errors." Neither model dominates universally — and increasingly, teams are mixing and matching based on the task.

The Writing Quality Caveat

One criticism worth noting: multiple users on Reddit have reported that Opus 4.6's writing quality regressed compared to Opus 4.5. Prose is described as "flatter, more generic" with more formulaic constructions. The theory is that reinforcement learning optimizations for reasoning came at the cost of natural prose quality. The community consensus: use 4.6 for coding, stick with 4.5 for writing. Anthropic has not formally addressed this.

What's Coming Next

The AI coding race isn't slowing down. Here's what we're watching:

DeepSeek V4 (expected ~February 17) — focuses on repo-level reasoning and a novel "Engram" memory system for near-infinite context retrieval. Expected to be open-weight.
Gemini 3 Pro — Google's model already leads LMArena's text leaderboard for user preference and matches Opus 4.6 on GPQA Diamond at 91.9%
Claude Sonnet 5 — Anthropic's next mid-tier model, expected with gains in coding and reasoning

February 2026 might be remembered as the month the AI coding wars went fully mainstream — complete with stock market crashes, Super Bowl ads, and models that help build themselves. The question isn't which model wins anymore. It's whether your workflow is ready for what's next.

Claude Opus 4.6 vs GPT-5.3-Codex: The AI Coding War That Crashed the Stock Market

The 20-Minute War

Claude Opus 4.6: The Highlights

GPT-5.3-Codex: The Highlights

Head-to-Head Benchmarks

Opus 4.6's Standout Result

Pricing: Where They Stand

The $285 Billion "SaaSpocalypse"

The Super Bowl Ad War

The Honest Assessment: Which Should You Use?

Choose Claude Opus 4.6 if you need:

Choose GPT-5.3-Codex if you need:

The Writing Quality Caveat

What's Coming Next

Sources

What do you think?

Comments

Be the First to Share Your Thoughts

Claude Opus 4.6 vs GPT-5.3-Codex: The AI Coding War That Crashed the Stock Market

The 20-Minute War

Claude Opus 4.6: The Highlights

GPT-5.3-Codex: The Highlights

Head-to-Head Benchmarks

Opus 4.6's Standout Result

Pricing: Where They Stand

The $285 Billion "SaaSpocalypse"

The Super Bowl Ad War

The Honest Assessment: Which Should You Use?

Choose Claude Opus 4.6 if you need:

Choose GPT-5.3-Codex if you need:

The Writing Quality Caveat

What's Coming Next

Sources

What do you think?

Share this article

You May Also Like

The AI Super Bowl: How Anthropic and OpenAI Turned a Football Game Into a Tech War

Vibe Coding: Build Apps Without Writing Code (The Future Is Here)

Comment-to-DM Automation: How to Turn Instagram Comments Into Customers (Without Getting Banned)

Comments

Be the First to Share Your Thoughts