Skip to main content
  1. Articles/

The MCP Token Tax: 32x Overhead, $51K/Month, and Four Ways to Fix It

·1430 words·7 mins·
Author
Florent Clairambault
CTO & Software engineer

The MCP Token Tax: 32x Overhead, $51K/Month, and Four Ways to Fix It

You adopted MCP because the ecosystem is genuinely excellent: 6,400+ servers, first-class support in Claude Code, Cursor, Copilot, and every other serious agent platform, and the promise of connecting your AI to GitHub, Jira, Slack, Postgres, and your internal APIs without building custom integration code.

What wasn’t on the label: MCP adds 32x token overhead compared to doing the same tasks via direct CLI calls, on simple queries. Three independent benchmarks published in 2026 have confirmed this figure. At 10,000 requests per day — a conservative estimate for a 10-engineer team running agent-heavy workflows — that translates to $120/month via CLI and $51,000/month via MCP for identical workloads.

The good news is this is a solved problem with known mitigations. The bad news is most teams aren’t running them.

The Root Cause: Schema Injection on Every Turn
#

The overhead is architectural, not a bug. Here’s the mechanism.

When Claude Code, Cursor, or any MCP client connects to a server, it receives the full JSON-RPC schema for every tool the server exposes. GitHub’s official MCP server ships 43 tools: push branch, create PR, list issues, search repos, get file contents, and 38 more. Those 43 tool definitions are injected into the conversation context on every single turn — regardless of which tools the current task actually uses, and regardless of whether the task needs any tools at all.

Scalekit ran 75 benchmark tasks on identical GitHub workflows, comparing direct CLI vs. MCP from GitHub’s Copilot server (statistically significant at p < 0.05). The simplest task — “what programming language does this repository use?” — required 1,365 tokens via CLI and 44,026 tokens via MCP. The model used 1–2 tools to answer the question. It paid for all 43 schemas anyway. That’s 32x overhead on the most trivial query possible.

The overhead compounds as sessions lengthen. Tool results are appended to conversation history and remain visible on every subsequent turn. A task that takes five MCP round-trips accumulates the schema overhead on every turn plus the verbatim output of every prior tool call. Long agentic sessions in complex tool environments are the worst-case scenario.

The Numbers, Attributed Correctly
#

Three data sources are cited in most coverage of this problem, and they’re frequently conflated. Here’s what each one actually says:

Scalekit (the 32x source): 75 benchmark runs comparing CLI vs. MCP on GitHub tasks. The 1,365-token CLI cost vs. 44,026-token MCP cost for the language-detection task is the direct source of the “32x” figure. At p < 0.05 across 75 runs, this is the most methodologically rigorous finding. Scalekit also found 72% MCP task-completion reliability vs. 100% CLI — the failures were TCP-level timeouts to GitHub’s remote MCP server.

n1n.ai June 24, 2026 (the 12x–40x range): Controlled experiments across three agent workflows (code review, PR triage, documentation update) using n1n.ai’s unified API gateway. Found 12x–40x per-task cost multipliers when comparing lean direct API calls vs. a 6-server MCP setup. Code review: $0.003 vs. $0.11 (~37x). PR triage: $0.02 vs. $0.38 (~19x). Documentation update: $0.001 vs. $0.04 (~40x).

OnlyCLI (the $51K/$120 figures): Applied the overhead math to enterprise scale. At 10,000 requests per day with a heavy MCP setup (GitHub + Slack + Sentry, ~143K tokens of schema per request), monthly cost: ~$51,000. The CLI equivalent for identical workloads: ~$120/month. These figures describe a specific scenario (10K daily requests with three multi-tool MCP servers), not a universal baseline — but the underlying math is internally consistent.

A GitHub issue (#2808, filed May 28, 2026) provided production data from 2,600 conversations over 22 days: approximately 10,000 tokens of schema overhead per conversation’s first turn, ~$0.15 each, and an estimated $390 in first-turn schema costs across the 22-day sample.

Four Mitigations That Work
#

1. Right-size your tool registry per session.

The largest lever: don’t load all tools into every session. Claude Code’s plugin system (allowedTools in .claude/settings.json) lets you specify which MCP tools are active per project. Backend debugging sessions don’t need Figma MCP. Security scanning sessions don’t need your marketing analytics connector.

The math is linear: at 1,000 tokens per tool definition (the GitHub issue’s production measurement), cutting your registry from 43 tools to 10 saves 33,000 tokens per turn. Over a 50-turn agentic session, that’s 1.65 million tokens. At Opus 4.8 pricing, roughly $8.25 per session — which adds up fast on agent-heavy teams.

Target: ≤10 tools per session for general work. For focused tasks (security scan, dependency audit), expose only the tools that task requires.

2. Compress your tool schemas.

MCP server authors write verbose descriptions for human readability in documentation. Production agents pay for every word at runtime. This creates an easy optimization that most teams skip.

A description field reading “The full name of the Git branch you want to create, following kebab-case naming convention — for example, ‘feature/add-user-authentication’” costs ~35 tokens. “Branch name to create (kebab-case)” costs 7 tokens. Both descriptions are unambiguous to a capable model. One is 5x more expensive.

If you maintain internal MCP servers, run a schema audit. The n1n.ai analysis found individual tool definitions consuming 500–1,500 tokens each. Compression typically yields 40–60% reduction in per-tool schema size with no loss of model performance.

3. Prompt-cache stable tool registries.

Claude’s prompt caching (5-minute TTL by default, configurable to 1 hour via ANTHROPIC_CACHE_CONTROL) lets you cache the tool schema block across turns. If your tool registry doesn’t change mid-session — and it almost never does — you pay the schema transmission cost once per cache window, not once per turn.

For long agentic sessions with stable registries, cache hit rates on tool schemas consistently exceed 90%. At a 90% cache hit rate, the effective per-turn schema overhead drops from 32x to roughly 4x. Still real; no longer catastrophic.

Claude Code v2.1.186+ supports cache control settings via environment variable. For Bedrock deployments, the service tier configuration (ANTHROPIC_BEDROCK_SERVICE_TIER) includes cache priority options.

4. Route by task type, not by session.

MCP’s highest-value use is orchestration: tools that discover information (read a GitHub issue, look up a Jira ticket, query a database row) to inform what happens next. The execution work — writing code, generating tests, summarizing findings, producing documentation — typically needs none of those discovery tools. Just a context window and a capable model.

Route planning and tool-heavy orchestration through MCP-connected sessions. Route high-volume, tool-free execution through direct API calls or Claude Code’s --print mode, which bypasses the interactive session layer and doesn’t load MCP servers. Teams that implemented this pattern reduced effective MCP overhead by approximately 60% in the OnlyCLI study.

Measuring Your Own Tax Rate
#

You don’t need to rely on external benchmarks. Claude Code’s Analytics API (available on Max, Team, and Enterprise plans) reports token consumption per session and per user per day. Pull a week of data, compare sessions where MCP tools were actively used vs. sessions without tool calls, and calculate the ratio.

That ratio is your actual MCP tax rate. It varies dramatically by tool registry size, schema verbosity, and session length. Most teams that run this calculation find their number is somewhere between 4x and 20x — worse than they expected, better than the 32x worst case, and very much improvable with the mitigations above.

The Structural Reality
#

MCP’s token overhead is architectural, not a defect. The stateless RC spec (currently in candidacy) improves server reliability and enables round-robin load balancing across identical servers — it doesn’t reduce schema transmission costs. The Agentic AI Foundation (Anthropic, OpenAI, Block, Google, Microsoft, AWS, Cloudflare) is treating MCP as infrastructure: the protocol will be with us for years, and its cost structure will be with us too.

The $51K/month figure is a worst case. The 32x figure is the real overhead on the simplest possible task with the most tool-heavy MCP server. Your actual cost depends on how carefully you configure your setup.

The tools to fix this exist today. Most teams just haven’t run the numbers yet.


Sources:

Related