---
title: "Claude Opus 4.7: 87.6% SWE-bench, Implicit-Need Tests, Same Price"
date: 2026-04-17
tags: ["Claude","Anthropic","model release","SWE-bench","agentic coding","benchmarks"]
categories: ["AI Tools","Industry"]
summary: "Anthropic shipped Claude Opus 4.7 on April 16, 2026. SWE-bench Verified jumps nearly 7 points to 87.6%, SWE-bench Pro leaps from 53.4% to 64.3%, and the model is the first Claude to pass implicit-need tests. Pricing stays flat at $5/$25 per million tokens."
---


Claude Opus 4.7 landed yesterday, April 16, 2026 — and for once, the headline isn't "new model costs more." Pricing holds at $5/$25 per million tokens (input/output), the same as Opus 4.6. What changed is the capability ceiling, and on the benchmarks that matter for agentic coding, the jump is real.

## The Numbers

**SWE-bench Verified** climbs from 80.8% to **87.6%** — a nearly 7-point improvement that puts Opus 4.7 ahead of both GPT-5.4 and Gemini 3.1 Pro (80.6%). More meaningfully, **SWE-bench Pro** — the harder, multi-language variant Anthropic itself helped design to be contamination-resistant — jumps from 53.4% to **64.3%**, leapfrogging GPT-5.4 (57.7%) and Gemini (54.2%).

To put the Pro number in perspective: every frontier model clustered around 53–58% on SWE-bench Pro as recently as two months ago. A 64.3% score is a genuine step change, not rounding-error progress.

**MCP-Atlas**, the agentic tool-use benchmark tracking multi-step agent behavior through Model Context Protocol workflows, hits **77.3%** for Opus 4.7, compared to 75.8% for Opus 4.6, 73.9% for Gemini 3.1 Pro, and 68.1% for GPT-5.4.

**OSWorld-Verified**, which tests computer-use tasks against real desktop interfaces, climbs from 72.7% to **78.0%** — within 1.6 points of the Mythos Preview at 79.6%, and ahead of GPT-5.4's 75.0%.

## What Actually Improved

Anthropic is specific about the improvements, which is more useful than the usual "stronger, better, smarter" release language.

**Fewer token errors in agentic loops.** The model produces a third of the tool errors of Opus 4.6 on complex multi-step tasks. In agentic workflows where a model is running dozens of tool calls in sequence, error accumulation is the enemy — errors compound, context gets polluted, and the agent recovers badly or not at all. Cutting tool errors by 67% has a multiplicative effect on task completion.

**14% improvement on complex multi-step workflows, using fewer tokens.** Anthropic isn't trading efficiency for accuracy here. Opus 4.7 completes more complex tasks while consuming fewer tokens — a combination that almost never shows up in model releases because it's easier to throw more compute at hard problems than to solve them more efficiently.

**Cursor's internal benchmark (CursorBench) jumped from 58% to 70%.** Cursor's team has one of the most honest third-party evaluations because it's run against their actual user workflows, not academic datasets. A 12-point jump on a real-world coding benchmark is significant.

**One production partner saw 13% higher resolution rate** on a 93-task coding benchmark, including four tasks that neither Opus 4.6 nor Sonnet 4.6 could solve at all. Tasks that were simply out of reach for the previous generation are now solvable.

## Implicit-Need Tests: The Understated Breakthrough

The benchmark most likely to be glossed over in coverage is the one that might matter most: Opus 4.7 is the first Claude model to pass what Anthropic calls **implicit-need tests**.

These are tasks where the model must figure out what tools or actions are needed, rather than being told explicitly. The distinction matters enormously in production agentic workflows. When you're running Claude Code autonomously — working through a multi-step spec, debugging a failing CI run, or refactoring a module it's never seen before — you cannot enumerate every tool call in advance. The model has to infer what's required from context.

Previous models handled explicit instruction well. "Use the filesystem tool to read X, then the search tool to find Y" — fine. But "figure out what's causing the test failure and fix it" requires inferring which tools to reach for, in what order, with what parameters. That's the difference between a capable model and a capable agent. Opus 4.7 passing implicit-need tests is Anthropic flagging that the gap is narrowing.

## Vision: 3x the Resolution

Opus 4.7 accepts images up to **3.75 megapixels** — about three times the limit of prior Claude models (approximately 1.25MP). In practical terms: full-resolution screenshots of multi-monitor setups, high-density UI designs, detailed diagrams, and dense code in screenshots are now readable without pre-scaling.

This matters specifically for computer-use workflows where the agent is reading the actual screen, not a compressed thumbnail. If you've seen Claude miss fine-grained UI details on OSWorld-style tasks, the resolution increase is directly relevant.

## Multi-Agent Coordination

Opus 4.7 introduces native multi-agent coordination — the ability to orchestrate parallel AI workstreams rather than processing tasks sequentially. This is the model-level capability that Anthropic's Claude Cowork and Managed Agents infrastructure have been building toward. The orchestrator model can now natively spawn, monitor, and synthesize work from parallel subagents rather than treating every task as a single-threaded problem.

Combined with the Claude Code desktop redesign that shipped three days ago (parallel sessions via git worktrees), Opus 4.7's multi-agent coordination is the model layer catching up with the tooling layer.

## Pricing: Still $5/$25

The pricing line is worth repeating because it's unusual: **$5 per million input tokens, $25 per million output tokens**, identical to Opus 4.6.

In a market where model releases routinely come with 20–40% price increases, Anthropic keeping the price flat while delivering genuine capability improvements is either a deliberate competitive move or a sign that their inference infrastructure improvements are outpacing raw capability gains. Possibly both.

For teams that have been holding off on Opus for cost reasons — running Sonnet 4.6 for most tasks, reserving Opus for the hard stuff — the economics of running Opus more aggressively just got better.

## Availability

Opus 4.7 is live across:

- **Claude API** (claude.com)
- **Amazon Bedrock** — with Bedrock's zero-operator-access guarantee, meaning neither Anthropic nor AWS operators see your prompts or responses
- **Google Cloud Vertex AI**
- **Microsoft Azure AI Foundry**

The v2.1.94 Claude Code update (released alongside the model) automatically defaults Bedrock, Vertex, and Foundry users to **high effort** instead of the previously controversial medium default — an implicit acknowledgment that the effort controversy from last week was a legitimate concern for enterprise users.

## What This Means for Claude Code Workflows

If you're running Claude Code in autonomous or semi-autonomous modes, Opus 4.7 changes the calculus in three ways:

**1. Fewer restarts.** The tool error reduction means complex multi-step agents are less likely to go off the rails mid-task. Every tool error is a branch point where the agent either recovers correctly or drifts. Fewer errors means fewer unrecoverable states.

**2. Harder tasks become viable.** The 64.3% SWE-bench Pro score and the implicit-need test results suggest that tasks requiring genuine inference about what to do next — not just instruction-following — are now in reach. The boundary of "what can I delegate to Claude Code" just moved outward.

**3. The orchestrator role gets stronger.** Multi-agent coordination as a native model capability, combined with lower error rates, makes Opus 4.7 a better orchestrator for multi-agent Claude Code setups. If you're running Claude Code Agent Teams, the lead agent just got meaningfully better at managing its subordinates.

Anthropic is building toward a model that doesn't just do what it's told — it figures out what needs to be done and does it. With Opus 4.7, that trajectory is visible in the benchmark numbers.

---

**Sources:**
- [Introducing Claude Opus 4.7 — Anthropic](https://www.anthropic.com/news/claude-opus-4-7)
- [Claude Opus 4.7 available in Amazon Bedrock — AWS](https://aws.amazon.com/blogs/aws/introducing-anthropics-claude-opus-4-7-model-in-amazon-bedrock/)
- [Claude Opus 4.7 leads on SWE-bench and agentic reasoning — The Next Web](https://thenextweb.com/news/anthropic-claude-opus-4-7-coding-agentic-benchmarks-release)
- [Claude Opus 4.7 Benchmarks Explained — Vellum AI](https://www.vellum.ai/blog/claude-opus-4-7-benchmarks-explained)
- [Claude Opus 4.7 with automated cybersecurity safeguards — Help Net Security](https://www.helpnetsecurity.com/2026/04/16/claude-opus-4-7-released/)
- [Claude Code Changelog — Claudefast](https://claudefa.st/blog/guide/changelog)

