The Three-Layer AI Coding Stack That Nobody Planned (But Everyone Is Building)

Table of Contents

The AI coding tools narrative has spent two years waiting for a winner. Cursor versus Copilot versus Claude Code versus Codex — pick one, evangelize it, wait for the others to die. It is a satisfying story, and it is not what is actually happening.

What is happening is stratification. Cursor, Claude Code, and OpenAI Codex are each staking out distinct roles in a layered architecture, and the most productive developers are treating them the way they treat Terraform, Docker, and Kubernetes: not as competitors you choose between, but as components you compose.

The pattern has a name now — composition over consolidation — and there is hard evidence it is the direction the tooling ecosystem is moving.

The Three Layers
#

Layer 1: Orchestration (Cursor)
#

Cursor has spent 2026 repositioning itself away from “AI-enhanced IDE” and toward “agent control plane.” Cursor 3, released in April, is the clearest expression of that ambition. The centerpiece is the Agents Window: a management interface for running multiple coding agents simultaneously across local machines, cloud sandboxes, SSH connections, and Git worktrees — all from one view.

The /best-of-n command is the most telling feature. It takes a single task description and runs it simultaneously across multiple models in isolated worktrees, then surfaces the results for comparison. You pick the winner, or you combine the best parts. The logic is the same as terraform plan for infrastructure: generate options before committing. Model selection becomes an infrastructure decision driven by task characteristics, not brand loyalty.

This is an orchestration model. Cursor’s value at this layer is not in writing the code — it is in managing the fleet of agents that write the code, controlling which models get which tasks, and providing a unified interface over a heterogeneous set of execution environments.

Layer 2: Execution (Claude Code and Codex)
#

This is where code actually gets written. Claude Code and OpenAI Codex occupy the same layer and compete on it — but the competitive dynamics are more interesting than simple head-to-head.

Claude Code’s position at the execution layer is strong. A February 2026 survey of 906 software engineers put Claude Code at a 46% “most loved” rating — the highest in the field. SemiAnalysis estimates it accounts for approximately 4% of all public GitHub commits as of March 2026, with projections toward 20% by year-end. Claude Code’s terminal-native architecture, deep codebase understanding, and Anthropic’s commitment to agentic workflows give it structural advantages at complex, long-context execution tasks.

Codex has reached 3 million weekly active users on OpenAI’s side, running autonomous coding tasks in sandboxed cloud environments. It is fast, capable, and increasingly embedded in enterprise pipelines that were already built around the OpenAI API surface.

The interesting development is not who is ahead. It is that OpenAI published codex-plugin-cc — a plugin that allows Codex to run inside Claude Code as a review agent. Let that sink in: OpenAI released a plugin for a competitor’s terminal to extend that competitor’s tool. Rather than waiting for Claude Code users to switch to Codex, OpenAI embedded Codex where they already work.

This is infrastructure distribution thinking, not zero-sum competition thinking. And it creates a review layer that is genuinely valuable.

Layer 3: Review (Cross-Provider Verification)
#

The codex-plugin-cc capability exposes a structural problem with any single-model development workflow: the model that writes the code is poorly positioned to independently catch its own errors. Its training biases, its failure modes, and its blindspots are consistent. Self-review by the same model is not adversarial.

Running Codex as a review agent inside Claude Code introduces a different model’s perspective at verification time. The plugin supports standard code review, adversarial pressure-testing around authentication and race conditions, and automatic review gates that block completion if issues appear. The two models were trained on different data, by different teams, with different objectives. Their failure modes are not correlated. That is the point.

This is a pattern borrowed from security engineering — the difference between a developer testing their own code and a dedicated security team with adversarial intent. The emerging cross-provider review layer formalizes that logic at the AI level.

The Stack Developers Are Actually Building
#

The practical upshot of this stratification is visible in what developers are deploying at scale.

A survey by The Pragmatic Engineer in February 2026 found that many senior engineers have converged on a two-tool baseline: Cursor for daily IDE work, and Claude Code for complex tasks requiring deep codebase context. This combination runs approximately $40 per month and covers the full range of development scenarios these developers encounter.

For teams with more sophisticated requirements, the pattern extends: Cursor orchestrates a fleet of agents, Claude Code handles execution of complex multi-file tasks, Codex runs as a review agent via the plugin, and GitHub Actions handles CI integration. Each component does what it is good at. None of them is trying to replace the others.

This is compositional reasoning applied to tooling. It is how mature developers already think about their infrastructure stack — you do not choose between a load balancer and a database — and it is how AI coding tools are beginning to be understood.

Why Claude Code Wins at the Execution Layer (For Now)
#

The terminal-native architecture is not an aesthetic choice. It is an architectural one that determines what Claude Code can and cannot do.

Running in the terminal means Claude Code has native access to your full development environment: the file system, the shell, running processes, git history, environment variables, the actual test runner output. It does not mediate through an IDE’s extension API. It does not inherit an editor’s mental model of what a “project” is.

For tasks that require genuine codebase understanding — large refactors, multi-file changes that need to reason about the whole system, debugging failures that require understanding what changed and why — this full-context access matters. IDE-based tools are architecturally constrained to the view the editor exposes. Claude Code is not.

The autonomy claim is real but requires context. Claude Code’s Auto Mode, the /ultraplan cloud planning sessions, and now Routines are all expressions of the same thesis: the agent should be capable of doing more work without interrupting the developer. Not every task needs human approval at each step. The tools that accept this and build for it are structurally different from tools that treat every action as requiring human confirmation.

Cursor’s self-hosted cloud agents and Codex’s sandboxed execution model are attempts to reach similar autonomy. They are credible. But they started from the IDE-centric model and are working toward autonomy, while Claude Code started from the autonomous agent model and is working toward better UI. The starting positions matter.

What the Convergence Signals
#

The emergence of a three-layer stack does not mean the competition is over. Cursor is competing at the orchestration layer by building a fleet management interface. Claude Code is extending at the execution layer through Routines, Ultraplan, and cloud-native features. Codex is distributing itself as infrastructure through the plugin model.

Each company has a defensible position in the layer where they are strongest, and each is attempting to expand into adjacent layers. Anthropic’s routines are a move from pure execution into scheduled orchestration. Cursor’s /best-of-n is a move from orchestration into multi-model execution management.

The interoperability is real — OpenAI publishing a plugin for Anthropic’s product is not a press release, it is a distribution decision — and it reflects a market that is large enough that the dominant strategy for any individual player is to make their layer indispensable, not to win at every layer.

For developers, the implication is practical: the teams treating AI coding tools as a pick-one decision are leaving capability on the table. The teams that have figured out what each layer is good for and composed them accordingly are operating at a different productivity level.

The three-layer stack was not designed by any committee. It emerged from developers solving real problems with the tools available. That is usually how the important infrastructure patterns happen.

Sources

The Three Layers#

Layer 1: Orchestration (Cursor)#

Layer 2: Execution (Claude Code and Codex)#

Layer 3: Review (Cross-Provider Verification)#

The Stack Developers Are Actually Building#

Why Claude Code Wins at the Execution Layer (For Now)#

What the Convergence Signals#

Related