---
title: "The Spec File as Source of Truth: How to Write Specs That AI Can Actually Implement"
date: 2026-05-05
tags: ["spec-driven development","SDD","AI coding","Claude Code","agentic workflows","prompt engineering"]
categories: ["Spec-Driven Development","Guides"]
summary: "Writing specs instead of code is the core premise of SDD — but a bad spec produces bad code just as reliably as a bad prompt does. Here's what separates specs that AI can execute reliably from the ones that waste hours of compute and your afternoon."
---


You've heard the pitch: in the age of AI coding agents, your job is to write the spec, not the code. Let the agent handle the implementation. Iterate on the output, not the keystrokes.

The pitch is correct. But it skips the hard part: writing a spec that works.

A vague spec produces vague code. An ambiguous spec produces code that's technically correct and functionally wrong. A spec that doesn't define its own boundaries produces an agent that wanders, hallucinates constraints, and confidently ships the wrong thing. Garbage in, garbage out — except now the garbage comes back 400 lines long and passes linting.

This is the skill that actually matters in 2026: writing specs that AI can reliably execute. Here's how to do it.

## What a Spec File Is (and What It Isn't)

A spec file is not a feature request. It's not a Jira ticket dressed up in markdown. It's a complete, self-contained description of a unit of software work — written from the AI's perspective, not the product manager's.

The distinction matters. A feature request tells you *what* the business wants. A spec file tells an agent *exactly* what to build, what constraints to respect, what "done" looks like, and what *not* to touch. An agent that reads a feature request will hallucinate the rest. An agent that reads a tight spec will execute it.

Think of the spec as an interface contract — like a function signature, but for a task. Everything the agent needs to know should be inside it. Nothing it doesn't need should clutter it.

## The Anatomy of a Well-Written Spec

Every spec that reliably produces correct implementations has five sections:

**1. Context** — what already exists that the agent must understand. Not the full history of the project; just the relevant surface. Which files, which modules, which API contracts. What the agent should read before it starts. In Claude Code, this section supplements `CLAUDE.md` with task-specific context.

**2. Goal** — a single, unambiguous statement of what the task produces. One sentence. If you need more than one sentence, you have more than one task.

**3. Constraints** — what must not change. Which files are off-limits. Which interfaces are frozen. Which behavior must be preserved. This is where most spec writers fail: they define what to build and forget to define what to protect.

**4. Acceptance criteria** — a checklist of verifiable conditions. Not "it should feel fast" but "the `/api/search` endpoint must return a response in under 200ms at p95 on a 10,000-record dataset." Each criterion should be testable without human judgment. If a CI pipeline can't check it, it doesn't belong in acceptance criteria — it belongs in a comment.

**5. Implementation notes** (optional) — hints about approach that you know from domain knowledge the agent doesn't have. Algorithm preferences. Third-party library choices. Known edge cases. This section is for *hints*, not instructions. If you're specifying the implementation, you're writing code, not a spec.

## Bad Spec vs. Good Spec: A Concrete Example

**Bad:**
```
Add rate limiting to the API.
```

An agent presented with this will pick an algorithm (token bucket or leaky bucket?), choose where to apply it (per-IP? per-user? per-endpoint?), decide where to store state (in-memory? Redis?), pick a library, and ship something. That something will be coherent and almost certainly wrong, because none of those decisions were yours.

**Good:**
```
## Context
- API is an Express.js app (src/api/)
- Redis instance available via REDIS_URL env var
- Authentication middleware in src/middleware/auth.ts adds req.userId

## Goal
Add per-user rate limiting to all /api/* routes: 100 requests per 15-minute window.

## Constraints
- Do not modify src/middleware/auth.ts
- Do not change existing API response shapes
- Must use the `express-rate-limit` package (already in package.json) 
  with `rate-limit-redis` store

## Acceptance criteria
- [ ] Requests beyond the limit return HTTP 429 with body: { error: "rate_limit_exceeded" }
- [ ] The X-RateLimit-Remaining header is set on every response
- [ ] Rate limit is per-user (req.userId), not per-IP
- [ ] Existing tests pass without modification
- [ ] New tests cover: under-limit request, limit-hit request, window reset

## Implementation notes
- Set windowMs and max via env vars RATE_LIMIT_WINDOW_MS and RATE_LIMIT_MAX
  (defaults: 900000ms, 100 requests) so QA can override in testing
```

The second spec is longer. It takes five extra minutes to write. It eliminates an entire category of rework — the kind where the implementation is technically correct but misses a decision you made in your head and never wrote down.

## Common Spec-Writing Mistakes

**Omitting the constraints section.** This is the most common failure mode. Agents are optimists — if something isn't forbidden, they'll change it. Every file that's off-limits, every interface that's frozen, every behavior that must be preserved: write it down.

**Writing acceptance criteria that require judgment.** "The UI should be responsive" is not a criterion. "All pages must pass Lighthouse mobile score ≥ 90" is. If you can't write a test for it, the agent can't verify it either.

**Mixing multiple tasks in one spec.** Each spec should map to one commit, one PR, one verifiable outcome. If your goal statement has "and" in it, split the spec.

**Specifying implementation in the constraints.** There's a difference between "use the `express-rate-limit` library" (a constraint that protects a decision already made) and "implement a token bucket algorithm that refills at 6.67 requests per minute" (you're writing the code). The former is a spec. The latter is micromanagement that makes the agent's output worse, not better.

**No test requirement.** If the spec doesn't include acceptance criteria that reference tests, the agent will either skip tests or write tests that verify what it built rather than what you specified. Require specific test coverage. Name the test cases.

## How Claude Code Uses Specs

Claude Code operates on two layers of spec. The first is `CLAUDE.md` — the persistent, project-level spec that defines architecture conventions, off-limits files, invariants that must never break, and the style of interaction you expect. This is your standing contract with the agent.

The second layer is the per-task spec — the document you write for a specific piece of work. In a Spec-Driven Development workflow, this is typically a file you commit to the repo (a `specs/` directory works well) before you invoke the agent. The agent reads it, plans against it, and executes it. You review the output against the acceptance criteria. If a criterion fails, you don't re-prompt with "fix the test" — you examine whether the spec was ambiguous and update it first.

The spec becomes version-controlled documentation of your design decisions. Six months from now, it tells you not just what was built but why the constraints existed.

## A Worked Example: From Idea to Executable Spec

Suppose you want to add a search feature to a blog. The naive path: "Hey Claude, add search to my blog." The SDD path:

First, you write the spec. You decide: search scope (posts only, not pages), search fields (title + summary, not body — body is too large and too expensive to index), result count (max 10), UI (a modal triggered by `Cmd+K`, consistent with the existing design system), where the search index lives (build-time JSON, not a server), and what "done" looks like (a Lighthouse performance score that doesn't regress, a keyboard accessibility test that passes).

Then you commit the spec and invoke the agent. The agent has everything it needs. It doesn't guess. It implements exactly what you decided — because you made those decisions in the spec instead of in your head.

The output is reviewable against the criteria. The criteria are testable. The spec is the diff you show your team when someone asks why search works the way it does.

## The Payoff

The value of a tight spec isn't just better AI output — though it is that. It's that writing the spec forces you to make decisions you'd otherwise defer to implementation time, where they're expensive to change.

Agents are fast. Unclear thinking is the bottleneck. The spec file is where you do your thinking. Do it well, and you're not supervising an AI writing code — you're shipping software.

---

*Sources: [Anthropic Spec-Driven Development Guide](https://docs.anthropic.com/en/docs/claude-code/spec-driven-development); [Anthropic 2026 Agentic Coding Trends Report](https://resources.anthropic.com/hubfs/2026%20Agentic%20Coding%20Trends%20Report.pdf); [Claude Code documentation — CLAUDE.md](https://docs.anthropic.com/en/docs/claude-code/memory)*

