Skip to main content
  1. Articles/

Your AI Agent Is Drowning in Tokens — Here's How to Fix It

·1360 words·7 mins·

Run cargo test in a Rust project. You get a wall of compilation progress, timing info, test names, and formatting that totals about 4,800 tokens. The information you actually need — which tests passed, which failed — fits in 11 tokens.

That’s a 99.8% noise ratio. And your AI coding agent just ate all of it.

Now multiply that across an agentic session. The agent runs git status, ls, cat a few files, runs tests, checks the diff, runs tests again. Each command dumps verbose output into the context window. Within thirty minutes, 80% of the context is CLI boilerplate that the agent will never reason about — but that actively interferes with its ability to reason about the 20% that matters.

This isn’t a cost problem. Well, it is — but the more insidious issue is a quality problem.

Why Noise Hurts More Than You Think
#

The “Lost in the Middle” Effect
#

The paper “Lost in the Middle” (Liu et al., 2023) demonstrated something counterintuitive: LLMs exhibit a U-shaped attention pattern. They’re great at using information at the beginning and end of the context, but significantly worse at using information in the middle.

This means every token of verbose git log or npm install output doesn’t just waste space — it pushes earlier context (your code, your instructions, the agent’s prior reasoning) into the attention dead zone. The more noise you add, the worse the agent gets at recalling what it was doing and why.

Even models with 200K+ token context windows aren’t immune. The degradation is about position, not capacity.

Irrelevant Context Degrades Reasoning
#

The GSM-IC study (Shi et al., ICML 2023) showed that adding irrelevant information to prompts “dramatically decreased” model performance — even when the model has the capability to solve the problem. LLMs get distracted by noise just like humans do.

In an agentic coding loop, this compounds. The agent runs a command, reasons about the output, runs another command. Each noisy output degrades the next reasoning step. Over a multi-hour session, the cumulative effect is measurable: the agent starts making worse decisions, forgetting earlier context, and repeating itself.

The Compounding Problem
#

Agentic sessions are iterative. Each loop adds more context. The effects compound in four ways:

  1. Context exhaustion: Noisy output fills the window faster, forcing earlier compaction or session restart. Sessions with filtered output last roughly 3x longer.
  2. Reasoning degradation: Each iteration pushes prior reasoning into less-attended positions.
  3. Cost multiplication: On pay-per-token models, 70% of spending can go to CLI noise. A 10-person team wastes roughly $1,750/month on tokens that actively make the agent worse.
  4. Rate limit pressure: On subscription plans, noisy output burns through quotas ~40% faster than necessary.

The fundamental insight: a 200K-token context filled with 80% noise performs worse than a 40K context filled with 100% signal. Token reduction is not just cost optimization — it’s reasoning quality optimization.

The Fix: Filter Before It Hits the Context
#

The solution is conceptually simple: intercept command output before it reaches the LLM and strip the noise. In practice, this requires knowing what’s noise and what’s signal for dozens of different CLI tools.

RTK (Rust Token Killer)
#

github.com/rtk-ai/rtk — MIT, 11.7k stars

RTK is the clear category leader. It’s a single Rust binary that acts as a CLI proxy: it intercepts shell commands issued by AI agents, runs them, and compresses the output before it reaches the context window.

How it works: rtk init --global installs a PreToolUse hook in Claude Code’s settings. When the agent issues git status, the hook transparently rewrites it to rtk git status. The agent never sees the rewrite — it only receives the compressed output. Less than 10ms overhead per command.

Four compression strategies:

  • Smart filtering: removes progress bars, ANSI codes, boilerplate, timing info
  • Grouping: aggregates files by directory, errors by type
  • Truncation: preserves relevant context, cuts redundancy
  • Deduplication: collapses repeated log lines with occurrence counts

Measured savings (from the docs):

CommandBeforeAfterReduction
cargo test4,823 tokens11 tokens99%
git diff HEAD~121,500 tokens1,259 tokens94%
git log -n 101,430 tokens194 tokens86%
ls (large dir)3,200 tokens640 tokens80%
npm test25,000 tokens2,500 tokens90%

It covers 40+ command patterns: git, cargo, docker, kubectl, npm/pnpm, pytest, vitest, playwright, eslint, tsc, ruff, golangci-lint, and more.

Agent support: Claude Code (native hook), Gemini CLI (Rust hook processor), OpenCode (plugin), and any MCP client via the rtk-mcp bridge.

The analytics are great: rtk gain shows cumulative savings, rtk gain --graph gives a 30-day ASCII chart, rtk discover scans your Claude Code history and tells you which unoptimized commands are wasting the most tokens. That last one is brilliant — it mines your actual usage to find optimization opportunities.

The RTK Ecosystem
#

RTK has spawned a constellation of community extensions:

  • rtk-mcp: MCP server bridge — use RTK from Cursor, Windsurf, Claude Desktop, any MCP client
  • openrtk: OpenCode plugin
  • pi-rtk: Extension for the Pi agent framework
  • rtk-dashboard: React-based real-time analytics dashboard
  • rtk-flake: Nix packaging

Beyond CLI Filtering: Other Approaches
#

RTK solves the CLI output problem. But there are other angles on token reduction worth knowing about.

AST-Based Skeletonization
#

skltn takes a different approach: instead of filtering command output, it skeletonizes code files. Using tree-sitter AST parsing, it reduces files to function signatures, type definitions, and docstrings — collapsing implementation bodies. Files under 2,000 tokens pass through unchanged; larger files get the skeleton treatment. Claims 5-15x more codebase fits in the same context window. Works as an MCP server.

Repomix (22.6k stars) packs entire repositories into single AI-friendly files. Its --compress flag uses tree-sitter to extract key code elements. Not a real-time proxy like RTK — more of a “prepare the repo for an AI conversation” tool. The most popular tool in the broader “make code AI-friendly” space.

Statistical Context Selection
#

copt (Context Optimizer) uses Bayesian statistical methods to decide which context chunks to include. It breaks context into semantic chunks, applies Beta-Bernoulli modeling with Thompson Sampling, and uses feedback-driven learning to improve chunk selection over time. A completely different philosophy from rule-based filtering — it learns what’s useful.

.NET-Specific Filtering
#

DotnetTokenKiller targets .NET CLI commands specifically (dotnet build, test, restore). Strips SDK banners, MSBuild headers, progress lines, ANSI codes. If your stack is .NET, this complements RTK for the commands RTK doesn’t cover.

How to Start
#

The highest-ROI move is installing RTK. It takes thirty seconds:

brew install rtk-ai/tap/rtk  # or: cargo install --git https://github.com/rtk-ai/rtk
rtk init --global

That’s it. Every Claude Code session from now on gets filtered output. No workflow changes, no new commands to learn. The hook is transparent.

After a few days, run rtk gain to see your actual savings. Then run rtk discover to see what’s still slipping through.

If you want to go further:

  • Add skltn as an MCP server for large codebase navigation
  • Use Repomix when preparing repo context for Claude.ai conversations
  • Keep an eye on copt if the statistical approach appeals to you

The Bigger Picture
#

The AI coding tool ecosystem is converging on a realization: context window management is infrastructure, not an afterthought.

We spent the last two years making context windows bigger. We’re now learning that bigger isn’t enough — cleaner matters more. A model reasoning over a pristine 40K-token context will outperform the same model reasoning over a noisy 200K-token context, every time.

Token reduction tools are the first generation of this infrastructure. Expect the next generation to get smarter: dynamic filtering based on the current task, learned models of what output is relevant, and tighter integration between agents and their output pipelines.

For now, RTK alone is worth the install. Your agent — and your wallet — will thank you.

Tool Comparison
#

ToolApproachSavingsStarsBest For
RTKCLI proxy, rule-based filtering60-99% per command11.7kDaily agentic coding (Claude Code, Gemini, OpenCode)
skltnAST skeletonization via tree-sitter5-15x more code in contextNavigating large codebases
RepomixRepo packaging with compressionvaries22.6kPreparing context for Claude.ai
coptBayesian chunk selectionlearns over timeExperimental / research-oriented
DTK.NET CLI filteringvaries.NET-specific projects

Related