---
title: "What AI Actually Needs in a Spec (It's Not What You Think)"
date: 2026-06-19
tags: ["spec-driven-development","specifications","ears","requirements-engineering","ai-coding","guides"]
categories: ["Spec-Driven Development","Guides"]
summary: "Research on 20,574 real agentic coding sessions found ~42% of agent failures trace back to specification quality, not model capability. EARS notation — a 50-year-old aerospace standard — is emerging as the format that best bridges human intent and AI execution. Here's what the data says about writing specs AI can actually use."
---


The instinct when an AI agent fails is to blame the model. Tweak the prompt, add more context, switch to a smarter model. This instinct is usually wrong.

A 2026 analysis of 20,574 real-world agentic coding sessions (arXiv:2605.29442) found that approximately 42% of agent failures trace to instruction and specification quality — underspecified instructions (15.4%) and misread developer intent (27.0%). Less than a third trace to model capability limitations. The agent isn't failing. The spec is.

This isn't a prompt engineering problem. It's a structural one. The way humans write requirements for other humans is systematically different from what AI agents need to execute reliably. Understanding that gap — and the formats that bridge it — is one of the highest-leverage skills in agentic software development.

## The Compression Paradox

The intuitive response to "my specs aren't working" is to add more detail. The second intuitive response, when that bloats the context, is to compress — be more concise, strip the ceremony, get to the point.

The data says this is expensive.

Researchers at arXiv:2604.07502 tested four information density conditions on agentic coding tasks. The most compressed representation reduced input tokens by 17.1% — and increased total session cost by **67.2%**. The session-to-file token ratio ballooned from 2.3x to 4.7x. Wall-clock time went from 1m 36s to 7m 00s. Compression that saved tokens at read time cost far more at inference time, because the model had to spend reasoning cycles reconstructing context that was stripped out.

The principle that emerges is *semantic density*, not brevity. Strip zero-information structural tokens (boilerplate, ceremonial prose, redundant headers). Preserve high-information semantic tokens (descriptive names, type annotations, constraints, interface contracts). A Java Spring Boot endpoint is roughly 170 lines, of which about 18 are business logic — 8:1 overhead ratio. Stripping that ceremony is good. Stripping the constraint that says "this endpoint must be idempotent" is not.

## Why AI Needs Different Specs Than Humans

When a human reads a spec, they fill gaps with domain knowledge, cultural context, and organizational memory. They ask clarifying questions. They remember that the last three times a requirement said "performant," the actual threshold was 200ms p95.

AI agents have none of that. They fill gaps with plausibility — they generate code that is syntactically correct, internally consistent, and entirely reasonable *in the absence of context they weren't given*. The result passes review because it looks right. It fails in production because it doesn't match the unstated constraint.

What AI actually needs in a spec:

- **Explicit preconditions and postconditions.** Not "the system handles invalid input" but "when the request body is malformed JSON, the system returns HTTP 400 with error code `INVALID_REQUEST` and no database write occurs."
- **State machine logic.** AI agents handle discrete state transitions well when the states are named; they hallucinate edge cases when state is implicit.
- **Integration contracts.** Interface types, expected response shapes, error codes. The agent doesn't know your downstream service's quirks unless you say them.
- **Invariants and constraints.** Things that must remain true across all implementations. The constraint reduces the solution space; without it, the agent optimizes for plausibility, not correctness.
- **Explicit non-goals.** What this spec is *not* trying to do. This is undervalued — "this endpoint does not need to support pagination in v1" prevents the agent from gold-plating.

## EARS: The Accidental AI Spec Format

In 2009, Alistair Mavin et al. at Rolls-Royce published a requirements format designed for airworthiness regulations — systems where ambiguous requirements cost lives. They called it EARS: Easy Approach to Requirements Syntax.

EARS constrains natural language to five sentence patterns:

| Type | Template |
|------|----------|
| Ubiquitous | `The <system> shall <response>` |
| State-driven | `While <precondition>, the <system> shall <response>` |
| Event-driven | `When <trigger>, the <system> shall <response>` |
| Unwanted behavior | `If <trigger>, then the <system> shall <response>` |
| Optional feature | `Where <feature is included>, the <system> shall <response>` |

Patterns compose: `While the user is authenticated, when a payment is submitted, the system shall validate the card and respond within 3 seconds.`

EARS was designed to make requirements auditable by aircraft certification bodies. It happens to be precisely what AI agents need: explicit triggers, preconditions, actors, and responses in a fixed sentence structure. The model doesn't have to infer what initiates the behavior. It doesn't have to guess what state the system needs to be in. Every EARS requirement is a directly testable acceptance criterion.

Airbus, Bosch, NASA, Intel, and Siemens adopted EARS for human engineers. AWS adopted it for AI.

## How the Ecosystem Is Converging

**AWS Kiro** (launched July 2025) uses EARS as its native requirements format. When you describe a feature in natural language, Kiro converts it into EARS-formatted requirements before generating architecture and implementation tasks. The three output files — `requirements.md` (EARS + acceptance criteria), `design.md` (architecture and data models), `tasks.md` (ordered implementation checklist) — form a complete specification that Kiro's agents can execute without human clarification loops.

This matters because Kiro is a commercial IDE product shipping to enterprises. EARS isn't being used because it's theoretically elegant — it's being used because it works at scale with real agents on real codebases.

**GitHub Spec Kit** (111K GitHub stars as of June 2026) takes a complementary approach. Rather than constraining the language itself, Spec Kit structures the *process*: four phases (Specify → Plan → Tasks → Implement), each producing a markdown file the next phase reads. A `constitution.md` captures non-negotiable organizational principles — technology stack, testing conventions, architectural invariants. EARS is a community-requested integration, not yet native.

The divergence between these two approaches is instructive. Kiro bets on format constraints (EARS) to make requirements unambiguous. Spec Kit bets on process constraints (phased workflow) to ensure completeness. The right answer for your team probably depends on whether your failure mode is underspecification (Spec Kit addresses this) or ambiguity in individual requirements (EARS addresses this).

## The Anatomy of a Spec AI Can Execute

The difference between a spec that works and one that doesn't is usually the difference between *what* and *so that*:

**A spec AI misreads:**

> Add rate limiting to the API.

**A spec AI can execute:**

> While the API is handling requests, when a client IP exceeds 100 requests per 60-second window, the system shall return HTTP 429 with header `Retry-After: <seconds>` and shall not process the request. The rate limit counter shall reset at 60-second boundaries (not sliding windows). The `POST /auth/token` endpoint is excluded from rate limiting. Rate limit state shall be stored in Redis with 90-second TTL to handle counter expiry cleanly.

The second version has more words. It also has zero ambiguity. The agent doesn't have to guess what "rate limiting" means in your system, what the threshold is, how reset windows work, or which endpoints are exempt. Every one of those unstated items in version one becomes an implementation choice the agent makes without you, which becomes a bug you discover in production.

## What SDD Gets Right

The Spec-Driven Development movement is, at its core, a recognition that the specification is the real artifact — not the code. Code is the spec materialized by an AI agent. When the materialization is wrong, the spec was wrong (or absent).

The research confirms this frame. The 42% of agent failures that trace to specification quality aren't failures of the agent — they're failures of the upstream process that handed the agent an underspecified task. Fixing this at the prompt level is the wrong abstraction. Fixing it at the specification level — before a single line of code is written — is where the leverage lives.

The formats that work (EARS, Spec Kit's phased structure, the BDD Given/When/Then pattern for acceptance criteria) share a common property: they force the human to make decisions that would otherwise be deferred to the AI. That deferral is where bugs are born.

The question isn't "how do I write better prompts?" It's "what needs to be true about the specification so that the correct implementation is the only plausible implementation?" Answer that, and the agent does the rest.

---

**Sources**: [arXiv:2605.29442](https://arxiv.org/html/2605.29442v1) · [arXiv:2604.07502](https://arxiv.org/abs/2604.07502) · [EARS origin — Mavin et al.](https://alistairmavin.com/ears/) · [GitHub Spec Kit](https://developer.microsoft.com/blog/spec-driven-development-spec-kit) · [AWS Kiro on EARS](https://builder.aws.com/content/2zeZNMGcgW2sVMoXb7U80hH8kBw/kiro-agentic-ai-ide-beyond-a-coding-assistant-full-stack-software-development-with-spec-driven-ai) · [Thoughtworks on SDD](https://www.thoughtworks.com/en-us/insights/blog/agile-engineering-practices/spec-driven-development-unpacking-2025-new-engineering-practices)

