---
title: "Why Engineers Are Writing Specs in HTML (And When You Should Too)"
date: 2026-05-15
tags: ["spec-driven-development","html","ai-agents","specs","claude-code","structured-data"]
categories: ["Spec-Driven Development","Guides"]
summary: "A growing number of engineering teams are ditching Markdown for HTML when writing specs — not because they enjoy writing more verbose documents, but because HTML's semantic structure gives AI agents significantly richer context when implementing from a spec. Here is where the tradeoff makes sense and how to do it well."
---


Markdown is the default format for everything in software engineering: README files, wikis, ADRs, specs. It is frictionless, readable in any text editor, and renders beautifully in GitHub. For most purposes it is perfectly fine.

But "perfectly fine" is not the same as "optimal for machine consumption." When you are practicing Spec-Driven Development — writing a spec and handing it to an AI agent to implement — the format of that spec is not a cosmetic detail. It is load-bearing infrastructure.

A growing number of teams are discovering that HTML, specifically semantic HTML, is a better substrate for complex specs. Not because HTML is fun to write, but because the semantic signal it carries meaningfully changes what an AI agent can infer from the document.

## The Semantic Gap Between Markdown and HTML

Consider two ways of marking up the same content.

In Markdown:

```markdown
# Auth

## Token format
...

## Refresh logic
...
```

In semantic HTML:

```html
<section id="auth" data-owner="team-identity" data-status="approved">
  <h2>Auth</h2>
  <section id="auth-token-format">
    <h3>Token format</h3>
    ...
  </section>
  <section id="auth-refresh-logic">
    <h3>Refresh logic</h3>
    ...
  </section>
</section>
```

The Markdown version is a flat list of headings. The HTML version is a graph. The agent reading the HTML version knows that "Refresh logic" is a child of "Auth," that a team named `team-identity` owns this section, and that the section has been approved — not just drafted. It can also link directly to `#auth-token-format` from anywhere else in the document without ambiguity.

In a 20-page spec this distinction is academic. In a 100-page spec covering authentication, payments, notifications, compliance, and internal APIs, it becomes the difference between an agent that navigates the document purposefully and one that drifts.

## Where HTML Genuinely Wins

**Large, multi-team specs.** When more than one team owns different sections of a spec, `data-owner` attributes give you machine-readable provenance without cluttering the human-readable content. An agent generating code for the payments flow can filter the spec to only sections where `data-owner="team-payments"` and avoid pulling in noise from adjacent sections.

**Stable internal cross-references.** Markdown's internal link syntax (`[see auth](#auth)`) works, but the anchor targets are derived from heading text, which changes. In HTML, `id` attributes are explicit and stable. A spec that references `<a href="#auth-refresh-logic">refresh behavior</a>` will not silently break if someone rewords the heading.

**Tabular data and API definitions.** HTML tables explicitly separate `<thead>` from `<tbody>`. A `<dl>` (definition list) is semantically perfect for API field definitions — each `<dt>` is a field name, each `<dd>` is its description and type. AI agents reading these elements know they are processing structured data, not prose.

**Status tracking.** `data-status="draft"` versus `data-status="approved"` versus `data-status="deprecated"` gives an agent immediate signal about which sections to implement against and which to flag for review. This is metadata that Markdown forces you to embed inline as text — where it is invisible to automated parsing.

**Long-lived living documentation.** If a spec will outlive the initial implementation and serve as the canonical reference for a system, HTML's explicit structure makes it easier to maintain, diff, and query over time.

## Practical Patterns Worth Adopting

Use `<section>` with explicit `id` attributes for every major and minor section. Use `<article>` for self-contained components (a single API endpoint, a single data model). Use `<aside>` for rationale, edge cases, and notes that should inform the agent but are not implementation requirements.

Use `<details>` and `<summary>` to fold verbose reference material — exhaustive field tables, example payloads, error code lists — so the document stays navigable for human readers without hiding anything from the agent, which processes the full DOM.

Embed Mermaid diagrams inline inside a `<figure>` with a `<figcaption>`. Claude Code and most modern agents will parse the diagram source, giving them a structured representation of flows and state machines without relying on a screenshot.

For API field definitions:

```html
<dl id="payment-intent-fields">
  <dt>amount</dt>
  <dd>Integer. Amount in minor currency units. Required.</dd>
  <dt>currency</dt>
  <dd>String. ISO 4217 code. Required.</dd>
  <dt>idempotency_key</dt>
  <dd>String. Client-generated UUID. Optional but strongly recommended.</dd>
</dl>
```

This is unambiguous in a way that a Markdown bullet list is not.

## Where Markdown Still Wins

Do not reach for HTML when you are in the discovery phase. Early specs are supposed to be messy. You are capturing intent, not formalizing a contract. Markdown's low friction is a feature during this stage.

Markdown also wins for anything that lives primarily in a GitHub pull request — review comments, ADRs, brief technical proposals. GitHub renders Markdown natively and inline. HTML in a PR diff is hostile to reviewers.

Single-author specs with a short implementation window do not need the overhead of explicit IDs and data attributes. The structural payoff only compounds over time and across teams.

## The Honest Tradeoff

Writing specs in HTML requires more discipline upfront. You will write more characters per section heading. Your spec files will not render prettily in a default GitHub blob view. You need to either host the spec somewhere or agree that the raw HTML is the authoritative source.

In return, you get a document that an AI agent can navigate like a database, cross-reference without ambiguity, and filter by ownership or status. For a complex system being implemented by autonomous agents over multiple sessions — exactly the SDD workflow — that is a meaningful structural advantage.

The format of your spec is a protocol between you and the agent implementing it. HTML is a richer protocol than Markdown. Whether the overhead is worth it depends entirely on the complexity and longevity of what you are building.

For a greenfield microservice with two engineers and a two-week timeline, write Markdown. For a multi-team platform with a six-month implementation arc and a spec that will outlive the engineers who wrote it, the structural investment in HTML pays for itself quickly.

