Most PR review tools have a false-positive problem. They generate long lists of findings, many of which are stylistic, non-actionable, or wrong. Developers learn to skim the output, stop reading it carefully, and eventually stop trusting it. The tool becomes noise.
Anthropic’s Claude Code Review — launched at general availability on May 6 at Code with Claude SF — was designed around the premise that the false-positive problem is the only problem worth solving. The multi-agent architecture exists specifically to filter noise before it reaches a developer. Whether it succeeds, and whether the price is justified, depends on your team’s size, PR volume, and how much you trust automated review already.
Here is what the tool actually does, what it costs, and how to think about whether it belongs in your workflow.
The Architecture: Three Passes, One Comment#
When you open a pull request with Code Review enabled, Anthropic dispatches a team of agents rather than a single model pass. The pipeline has three stages:
1. Parallel bug-finding. Multiple agents work through the PR simultaneously, looking for logic errors, security issues, race conditions, missing error handling, and implementation mistakes. The parallelism matters: a single agent working sequentially through a 1,000-line PR has less capacity for deep analysis on any one section than specialized agents focusing on different dimensions.
2. Verification pass. Each candidate finding goes through a second agent that verifies whether the issue is real, context-valid, and worth surfacing. This is the false-positive filter. An agent that flags a potential null dereference that is actually guarded by an upstream check gets corrected here, before the finding reaches the PR.
3. Severity ranking. Verified findings are ranked. Critical bugs surface at the top; low-severity suggestions are included but positioned below. The output is a prioritized list, not a flat dump.
The result is two outputs on the pull request: a single overview comment summarizing findings by severity, and in-line comments on specific lines for issues that require per-line context. Anthropic’s stated design goal is that the overview comment should be readable in two minutes and actionable without reading the full in-line comments first.
The entire process takes approximately 20 minutes. It is asynchronous — you open the PR, continue working, and the review appears when it is done.
What the Performance Numbers Say#
Anthropic published internal benchmarks at launch. The two most relevant figures:
- Large PRs (1,000+ lines changed): 84% receive at least one finding, averaging 7.5 issues
- Small PRs (under 50 lines): 31% receive findings, averaging 0.5 issues
The large-PR number is the one to focus on. On complex changes — the kind where a human reviewer would spend 45 minutes and still miss things — Code Review finds something meaningful in 5 out of 6 cases and returns an average of 7 to 8 ranked issues. That is a non-trivial signal density for a 20-minute asynchronous process.
The small-PR number is less impressive but expected. Under 50 lines, the verification pass tends to correctly discard most candidates as context-insufficient. You should not use Code Review as a substitute for review on small, targeted changes.
Pricing: What “Billed Separately” Actually Means#
Each review costs $15–25, scaling with PR size and codebase complexity. This is billed separately through Anthropic’s “extra usage” mechanism and does not count against your plan’s included usage.
This matters for budgeting. If your team ships 20 PRs per week, you are looking at $300–$500/week, or roughly $1,200–$2,000/month, before any volume negotiation on an Enterprise plan. For a 10-person team where each engineer’s time costs the company $50–$100 per hour, that is equivalent to 12–40 engineer-hours of review per month — roughly 1.2 to 4 hours per engineer.
Whether that math works depends on your review patterns. If your team’s PRs are large and complex and reviewers frequently miss bugs that reach production, $15–25 per PR is cheap. If your team ships small, well-tested PRs with high review coverage already, the cost-benefit calculus is weaker.
Enterprise teams can negotiate per-seat pricing structures. The $15–25 figure is the published baseline; at scale the effective per-review cost drops.
Availability: Team and Enterprise Only#
Code Review is not available on Pro or Max plans. It is a Team and Enterprise feature, with a GitHub integration that posts comments directly to PRs via the GitHub App.
This is a deliberate positioning choice. The feature requires codebase context (not just the diff), which means it needs the Claude Code codebase integration already configured — something that is standard in Team and Enterprise deployments but less commonly set up on individual Pro accounts.
Anthropic claimed that every internal engineering team at Anthropic uses Code Review. That claim doubles as a statement about the intended customer profile: teams shipping production code at speed where review bandwidth is a real constraint, not individual developers working on side projects.
How It Compares#
| Tool | Architecture | Availability | Price per review | Async? |
|---|---|---|---|---|
| Claude Code Review | Multi-agent (find → verify → rank) | Team / Enterprise | $15–25 | Yes (20 min) |
| /ultrareview | Single dedicated cloud session | Pro / Max (3 free) / Team / Enterprise | Included (Pro/Max) or extra usage | Yes |
| Cursor Bugbot | Single-pass static analysis + LLM | Cursor Pro+ | Included in plan | No (inline) |
| Greptile | Codebase-aware semantic search + LLM | All plans | $29–$99/mo flat | Yes |
| CodeRabbit | Rules + LLM hybrid, configurable | All plans | $12–$24/user/mo | Yes |
A few things to note in this comparison:
/ultrareview is a different product category. It is a dedicated Claude session that produces a long-form review document — closer to a senior engineer’s written code review than an automated scan. It is better for catching architectural issues and reasoning about tradeoffs; Code Review is better for catching bugs at speed. They are complementary, not competing.
Cursor Bugbot is the closest structural competitor, but it runs as a single pass without a verification step and is embedded in the Cursor IDE context, not a GitHub integration. It is better suited to inline suggestions during active coding than to async PR-level review.
Greptile and CodeRabbit are both configurable, rules-aware tools that work well at the process layer — enforcing conventions, catching common mistakes at low cost per PR. They are not multi-agent verification systems. At $12–99/month flat, they cost less at low PR volume and more at high PR volume.
The matrix for choosing:
- $15–25/PR Code Review → high-complexity PRs, security-sensitive codebases, teams where production bugs are expensive
- /ultrareview → architectural decisions, large refactors, onboarding a new area of the codebase
- Greptile/CodeRabbit → lightweight convention enforcement, all PR sizes, cost-sensitive teams
- Cursor Bugbot → inline suggestions during development, not async PR review
Is It Worth It?#
The honest answer is: it depends on what a production bug costs you.
For a consumer app with a fast rollback cycle and low blast radius, $15–25 per PR is hard to justify against the alternatives. For a fintech platform, a healthcare system, or infrastructure code where a missed race condition or authentication bug creates a serious incident, $25 to catch it before merge is not a discussion worth having — it is obviously worth it.
The more interesting question is what Code Review changes structurally. If developers know that a verification pass is going to run on every PR, the review bandwidth problem changes shape. A team of four engineers cannot review every PR at depth. A multi-agent system running in parallel can. That changes what human reviewers focus on: architectural intent, product logic, and edge cases that require domain knowledge — rather than bug-hunting that an agent does better under time pressure.
Anthropic’s stated goal is not to replace human code review but to change what human reviewers spend their time on. Based on the architecture and performance numbers, that is the correct framing.
At $15–25/PR with a 20-minute async cycle, the adoption friction is low enough that the right move for most Team and Enterprise accounts is to run it in parallel with your existing process for a sprint, measure the findings against what your human reviewers caught, and let the data decide.
Sources:
- Code Review for Claude Code — Anthropic Blog
- Code Review — Claude Code Docs
- Anthropic Code Review for Claude Code: Multi-Agent PR Reviews, Pricing, Setup, and Limits — DEV Community
- Claude Code Review vs Bugbot vs Greptile vs CodeRabbit — FindSkill.ai
- Anthropic Charges $25 Per PR — Claude Code Review Backlash — Level Up Coding
- Is Claude Code Review Worth $15–25 Per PR? (2026 Verdict) — BuildFastWithAI
- Live blog: Code w/ Claude 2026 — Simon Willison
- Claude Code PR reviews are here, just $15–25 — LinkedIn / John Crickett