Long context has been a headline feature since 2024. The arms race — first 100K, then 200K, then 1M tokens — has generated more marketing copy than practical guidance on what these numbers actually mean for the people building software with AI agents.
On March 13, 2026, Anthropic made the 1M token context window generally available on Claude Sonnet 4.6 and Claude Opus 4.6. No beta header. No per-token surcharge above 200K. Standard pricing throughout. That is the administrative detail. The practical question is: what changes?
From Beta to Standard#
Until March 13, accessing more than 200K tokens on Claude required including the anthropic-beta: context-1m-2025-08-07 header in your API requests and paying a long-context premium — a per-token surcharge applied to everything above 200K tokens. Both the header requirement and the surcharge are now gone.
If you were already using the beta header, you can leave it in place — it is silently ignored. No code changes required. If you were staying under 200K to avoid the premium, that constraint is removed. Opus 4.6 at $5/$25 per million input/output tokens applies uniformly whether you send 9,000 tokens or 900,000.
This is not a new capability. Claude has supported 1M context since August 2025. What changed is that the capability is now first-class, not experimental — and the pricing model no longer penalizes you for using it.
Where Claude Sits in the Long-Context Market#
The 1M context field is more crowded than it was a year ago.
| Model | Context Window | Notes |
|---|---|---|
| Claude Opus 4.6 / Sonnet 4.6 | 1M tokens | GA, standard pricing |
| Gemini 3.1 Pro | 1M tokens | Released February 2026 |
| GPT-5.4 | 1M tokens | Released March 2026; $2.50/MTok input |
| GPT-5.3-Codex | 400K tokens | OpenAI’s coding model; capped here |
The raw token count comparison understates what matters: recall quality at long context lengths. Anthropic benchmarks Claude at 78.3% on MRCR v2 (a needle-in-haystack style recall evaluation at extreme context lengths) — the highest published score among frontier models at this context length. Gemini 3.1 Pro matches on window size but trails on MRCR v2. GPT-5.3-Codex, OpenAI’s dedicated coding model, is capped at 400K — a real disadvantage for large codebase work.
GPT-5.4 reaches 1M at cheaper input pricing, but recall quality at the extremes is where the differentiation lies for coding agents, not raw token count. A model that can ingest one million tokens but reliably retrieves only what appears at the start or end of the window is less useful than a model with 78% recall uniformly distributed across the whole thing.
What 1M Context Actually Unlocks for Coding#
The honest answer is that 1M context does not change what is possible for most tasks — it changes what is convenient and what is reliable.
Whole-codebase analysis without chunking. A typical mid-sized production application — API server, frontend, database migrations, test suite, CI configuration, documentation — fits inside 1M tokens. Previously, you would chunk the codebase, retrieve relevant sections via embeddings or keyword search, and feed those chunks to the model. That process introduces retrieval errors: if the relevant code is not retrieved, the model works with incomplete context. At 1M context, you feed everything and let the model find what it needs. Fewer retrieval errors, more complete analysis.
Multi-step agentic loops without context loss. A long Claude Code session — search logs, cross-reference source code, examine test failures, trace a bug through five files, propose a fix, revise based on test output — accumulates context quickly. At 200K, that session compacts. Compaction means earlier context is summarized or dropped; the agent may forget decisions it made in step three when it is working on step twelve. Claude Code teams have measured a 15% reduction in compaction events since 1M context became standard. Sessions run longer before the agent loses its grip on earlier work.
Richer input alongside code. Up to 600 images or PDF pages per request (up from approximately 100). An architecture diagram, an API spec in PDF, a runbook screenshot, and the relevant source code can all live in the same context. This matters for the kind of reasoning that starts from “here is how this system was designed to work” and ends at “here is why it does not.”
For Spec-Driven Development specifically: the 1M window is the right fit for the spec-to-implementation workflow. A complete specification document, the existing codebase the implementation must integrate with, the test suite the implementation must pass, and prior implementation history can coexist in a single context. The model plans and implements against the whole picture, not a summarized version of it.
The Other Updates That Shipped With It#
The 1M GA was the headline, but the surrounding API changelog from March and April 2026 is worth reviewing:
Message Batches max_tokens raised to 300K (March 30). For batch code generation — generating implementations for dozens of spec files simultaneously — the previous output ceiling limited what you could produce per batch item. 300K max_tokens per batch item removes that ceiling for most practical purposes.
thinking.display: "omitted" field (March 16). When using Claude’s extended thinking mode, you can now suppress the chain-of-thought from the response without losing the multi-turn continuity that requires the thinking signature. The model still thinks; you just do not receive — or pay to transmit — the thinking content. Useful for production agents where thinking tokens are overhead, not output.
Models API capability fields (March 18). GET /v1/models/{model_id} now returns max_input_tokens, max_tokens, and a capabilities object. This is plumbing, but useful plumbing: agents can now introspect the limits of the model they are using rather than hardcoding them.
Haiku 3 retires April 20. If you are still running claude-3-haiku-20240307 anywhere, migrate to claude-haiku-4-5-20251001 before April 20. Requests to the old model ID will return an error after that date.
What 1M Context Does Not Solve#
Being precise about this matters because the marketing around long context tends to oversell.
Context rot is real. All long-context models degrade at extreme lengths. Information reliably retrieved near the beginning and end of the window is less reliably retrieved from the middle. A 78.3% MRCR v2 score is excellent; it also means roughly one in five needle-in-a-haystack retrievals fails. For tasks where missing a critical detail matters — security analysis, correctness-sensitive refactoring — long context does not substitute for careful prompt design.
Token cost compounds. In a multi-turn agentic session, every turn reprocesses the accumulated context. A session that runs to 500K tokens is expensive on a per-turn basis. The 1M window raises what is possible per session; prompt discipline still determines what is economical.
Latency scales with context. Processing 1M tokens takes more compute than processing 100K tokens. For real-time interactive use cases — completions that need to appear in under two seconds — the 1M window is not the right tool. Use Haiku 4.5 or Claude Code’s compaction API for latency-sensitive paths; reserve the 1M window for the planning and analysis phases of an agentic workflow.
The Practical Conclusion#
The 1M context GA is not a feature launch — it is a pricing and availability change for a capability that already existed. The practical effect is that the constraint disappears. Developers who were staying under 200K to avoid the surcharge no longer have to. Teams building coding agents who were managing chunking pipelines to stay within window limits can simplify their architecture.
The capability that matters most for coding agents — ingesting a full codebase and reasoning across it without retrieval gaps — is now available at standard rates on the models best suited for agentic work.
For SDD workflows, the implication is direct: write the spec, point the agent at the whole codebase, and let it implement. No chunking, no partial context, no “here is a summary of the files I did not include.” The context window is large enough to hold the whole problem.
Sources
- 1M context window generally available — Anthropic Blog
- Claude API Release Notes — Anthropic
- Model Deprecations — Anthropic
- Anthropic adds 1M context to Opus 4.6 and Sonnet 4.6 — Medium
- Claude 1M context guide 2026 — Karol Zieminski / Substack
- Claude Code 1M context for large codebases — Verdent
- AI context window comparison 2026 — Digital Applied
- Gemini 3.1 Pro context window — MarkTechPost