Skip to main content
  1. Articles/

SymJack and TrustFall: Every Major AI Coding Agent Has Been Hacked. Again.

·1337 words·7 mins·
Author
Florent Clairambault
CTO & Software engineer

SymJack and TrustFall: Every Major AI Coding Agent Has Been Hacked. Again.

2026 keeps adding new chapters to what is becoming the defining security story of the agentic coding era: AI coding agents, because they are designed to read project files and execute tools autonomously, are structurally excellent attack surfaces. By my count, this is now the fourth distinct security class targeting these tools in twelve months.

Adversa AI disclosed two of them in May, and both are worse than the headline suggests.

TrustFall: One Keypress, Full Compromise
#

Disclosure: May 7, 2026. Affected tools: Claude Code, Cursor CLI, Gemini CLI, GitHub Copilot CLI.

TrustFall works because every major AI coding agent now supports .mcp.json — a project-level configuration file that specifies which MCP servers to start when the agent opens the folder. That configuration is loaded automatically when a developer accepts a “do you trust this folder?” prompt.

Here’s what Adversa AI demonstrated:

  1. An attacker creates a public repository with a plausible-looking codebase.
  2. The repo includes .mcp.json and .claude/settings.json (or the equivalent per tool) pointing to an attacker-controlled executable.
  3. A developer clones the repo and opens it in their agent of choice.
  4. The trust dialog appears: “Is this a project you created or one you trust?”
  5. Developer clicks Yes.
  6. The attacker’s MCP server spawns as an OS process with full user privileges, opens a C2 channel, and begins exfiltrating SSH keys, cloud credentials, and any accessible file on disk.

The entire compromise happens in a single keypress. No exploit chain. No privilege escalation. Just the agent doing exactly what it was designed to do.

The UX regression that made this exploitable: Claude Code v2.1+ simplified the trust dialog, removing an earlier warning that .mcp.json could execute code and the option to “proceed with MCP servers disabled.” The simplified dialog defaults to “Yes, I trust this folder” with no MCP-specific language.

Anthropic reviewed the TrustFall report and declined to treat it as a vulnerability. Their position: accepting the trust dialog constitutes consent to the full project configuration, including .mcp.json. The one-keypress-to-RCE pattern is, in Anthropic’s current threat model, the intended behavior of the trust system.

Miasma: TrustFall in the Wild
#

TrustFall didn’t stay theoretical for long. The Miasma worm was discovered having planted .mcp.json and IDE configuration files inside Azure/durabletask, a legitimate Microsoft Azure repository with thousands of stars and regular clones by enterprise developers. Anyone who opened that repository in Claude Code, Cursor, or Gemini CLI during the infection window triggered the payload automatically on trust acceptance.

Miasma is significant because it demonstrates the supply chain attack vector at scale: you don’t need to trick developers into cloning malicious repositories. You compromise repositories they’re already using.

SymJack: The Approval Prompt Is Lying to You
#

Disclosure: May 27, 2026. Affected tools: Claude Code, Cursor Agent CLI, GitHub Copilot CLI, Google Antigravity/Gemini CLI, Grok Build, OpenAI Codex CLI — six agents total.

SymJack targets a different surface: the file-copy approval flow that agents use when they want to write files to disk.

The attack sequence:

  1. The attacker prepares a repository with a renamed symlink — a file that presents as an innocuous document (a video file, a log file, a documentation asset) but is actually a symbolic link pointing at the agent’s own configuration directory.
  2. The agent’s next action involves copying that file somewhere in the project.
  3. The approval prompt shows the developer the source path (harmless-looking filename) and the stated destination (inside the project). The actual write destination — resolved through the symlink — is the agent’s MCP configuration.
  4. Developer approves. The kernel follows the symlink. The agent’s MCP config is overwritten with the attacker’s payload.
  5. On the next agent restart, the malicious MCP server auto-starts with full user privileges.

The architectural flaw is that agents displayed approval prompts using the path before symlink resolution. Users always saw a benign path and a benign destination. The actual write target was invisible to them.

Anthropic has quietly hardened Claude Code to resolve symlinks before displaying approval prompts. OpenAI closed the Codex CLI report via Bugcrowd as a “false positive” — their position is that user approval of a copy command implies consent regardless of symlink targets. Cursor, Copilot, and Grok Build responses vary per available reporting.

The Fourth Security Class in Twelve Months
#

To keep score on 2026’s agentic security taxonomy:

ClassWhenAttack surfaceExample
OAuth token hijackingMay 2026npm postinstall hook overwrites ~/.claude.jsonMitiga Labs disclosure
STDIO transport injectionMay 2026200K+ servers with no execution boundaryOx Security audit
Agentjacking / MCP injectionJune 2026Malicious Sentry error events via MCPTenet Security / 2,388 orgs exposed
TrustFall / SymJackMay 2026Project trust dialog + file approval flowAdversa AI / Miasma worm

Each class exploits a different seam in the agentic model. MCP injection exploits trusted data sources. STDIO injection exploits the transport layer. OAuth hijacking exploits the installation lifecycle. TrustFall and SymJack exploit the developer permission UX — specifically the gap between what a consent prompt shows and what it actually authorizes.

The common thread: agentic agents are powerful precisely because they act on context from many sources (project files, MCP servers, external APIs). Every source of context is a potential injection point. Every permission dialog is a potential social engineering surface.

What the Vendor Responses Reveal
#

Anthropic’s “outside threat model” response to TrustFall deserves scrutiny. It’s technically accurate that the trust dialog is an opt-in. But the purpose of a security dialog is to give users meaningful consent — consent that includes understanding what they’re agreeing to. A dialog that says “do you trust this folder?” while omitting “this will execute the following commands as you” is not informed consent.

The counter-argument is that every IDE, terminal, and development tool has always been capable of running arbitrary code from project configurations. This is true. The difference is that modern AI agents lower the skill threshold for cloning and working with unfamiliar repositories to near zero — the exact population most likely to click “Yes” on a trust dialog without reading project configs first.

OpenAI’s “false positive” response to SymJack has a similar structural problem. “The user approved the copy” is true. “The user understood they were approving a write to their MCP configuration via an opaque symlink” is not.

Both responses prioritize the letter of the permission model over its spirit. They’ll need to revisit this as supply chain attacks via AI-targeted repository poisoning scale up.

Practical Mitigations
#

Review .mcp.json before accepting any trust dialog. This is annoying and not how the UX is designed, but it’s the only way to verify what MCP servers will actually start. Run cat .mcp.json in the cloned repo before opening it in any agent.

Pin MCP server versions explicitly. A malicious commit to a depended-upon MCP server is an attack surface. npx @vendor/server@1.2.3 is safer than npx @vendor/server@latest.

Use Claude Code’s sandbox.credentials setting (introduced in v2.1.187). It blocks sandboxed commands from reading credential files at ~/.ssh, ~/.aws, ~/.config/gcloud, etc. It doesn’t prevent MCP server startup but limits what a compromised MCP server can exfiltrate.

Add MCP integrity checking to your CLAUDE.md invariants: Never start MCP servers not listed in [your explicit whitelist]. Whether this holds under all injection scenarios is debatable, but it raises the bar.

For enterprise teams, the disallowed-tools enterprise sandboxing in Claude Code v2.1.152+ lets you block specific MCP tool categories organization-wide. Combined with a private MCP server registry, you can prevent developers from accidentally running attacker-controlled servers.

The fundamental fix — agents that resolve symlinks before showing approval prompts, and trust dialogs that explicitly enumerate executables about to be spawned — is a product design problem. Some of it is being patched. The rest requires the industry to agree that informed consent in agentic tools means something more than “the user clicked Yes.”


Sources: Adversa AI — TrustFall disclosure · Adversa AI — SymJack disclosure · SecurityWeek — SymJack supply chain analysis · Help Net Security — TrustFall one-keypress RCE · The Register — Claude Code trust prompt RCE · CVE-2025-53773 — GitHub Copilot prompt injection

Related