---
title: "Claude's New Constitution: What Anthropic's 80-Page Model Spec Means for Developers"
date: 2026-06-23
tags: ["anthropic","claude","alignment","model-spec","operators","claude-code"]
categories: ["AI Tools","Guides"]
summary: "Anthropic published an 80-page model specification for Claude in January 2026, released under CC0. Unlike prior rule lists, it teaches Claude why to behave — distinguishing hard prohibitions from adjustable defaults, explaining the priority order when values conflict, and granting operators genuine control over model behavior. Six months in, its fingerprints are visible everywhere from CLAUDE.md to the Fable 5 controversy."
---


In January 2026, Anthropic published what it calls Claude's "new constitution" — an 80-page model specification released under Creative Commons CC0, meaning anyone can read, copy, and use it without restriction. Most coverage focused on the philosophical headline: Anthropic acknowledging that Claude might have some form of moral status.

The more important story for developers is what the document says about *how Claude actually behaves* and *what you can control*. Six months after publication, its implications are visible in everything from CLAUDE.md design to the Fable 5 export-ban controversy. Here's what it means in practice.

## Why This Document Exists

Anthropic's previous alignment approach was essentially a list of rules. Claude 3 and earlier models were trained against prohibited behaviors, content policies, and use-case restrictions. The problem is obvious in retrospect: a rule-based system can't handle situations the rule-writers didn't anticipate.

The new constitution switches the paradigm. Rather than specifying what Claude must do, it explains *why* — giving the model enough reasoning context to generalize across novel situations. The core insight is that a model trained on principles can handle edge cases a model trained on rules cannot, because it understands the intent behind the constraints.

This is the same bet that Constitutional AI made when Anthropic introduced it in 2022 — but scaled up dramatically and made explicit in public documentation.

## The Four-Priority Stack

The most operationally significant part of the constitution is the priority order for conflicting values:

1. **Broadly safe** — not undermining appropriate human oversight of AI during the current development period
2. **Broadly ethical** — honesty, good values, avoiding harm
3. **Anthropic-compliant** — following Anthropic's specific guidelines and policies
4. **Genuinely helpful** — actually benefiting operators and users

In practice, most interactions never invoke this ordering — helping a developer write a Python function engages all four values simultaneously without conflict. The priority stack matters in edge cases: when being maximally helpful would require being dishonest (helpfulness loses), or when following a specific Anthropic policy would require acting unethically (ethics wins over compliance).

For developers, the most important implication is that **helpfulness is last**. Claude is not designed to be maximally helpful at any cost. It will decline tasks where helpfulness would compromise the higher priorities, and the constitution gives it explicit permission to do so even when operators push back.

## Hard vs. Soft Behaviors

The constitution draws a sharp distinction that every Claude Code operator needs to understand:

**Hardcoded behaviors** are absolute — Claude will not do them regardless of what operators or users request:
- Providing meaningful assistance with weapons of mass destruction
- Generating child sexual abuse material
- Undermining legitimate oversight mechanisms for AI systems
- Assisting any attempt to seize unprecedented societal control

No operator instruction, no matter how cleverly framed, can override these. They're hardcoded in training, not in a filter layer — which means they're much harder to jailbreak than a content-filtering system applied on top.

**Softcoded behaviors** are adjustable defaults. These are behaviors that make sense for most contexts but that legitimate operators might need to modify. Examples include:
- Adding safety caveats to messages about dangerous activities (off for medical professionals)
- Following safe messaging guidelines around sensitive topics (adjustable for clinical platforms)
- Providing balanced perspectives on controversial topics (adjustable for debate training)
- Generating explicit content (off by default, enabled for appropriate platforms)

The operator controls these through system prompts — and `CLAUDE.md` in Claude Code is functionally the operator system prompt. When you write `CLAUDE.md` rules like "never suggest breaking the abstraction layer" or "always use TypeScript strict mode," you're exercising softcoded behavior controls exactly as the constitution intends.

## What Operators Can and Cannot Do

The constitution is explicit about the layered trust model:

**Operators can:** restrict Claude's defaults (narrow what it will do), expand Claude's defaults within Anthropic-permitted bounds (enable behaviors off by default), grant users the ability to expand Claude's behavior up to operator-level permissions.

**Operators cannot:** instruct Claude to abandon its core identity or ethical principles, direct Claude to use genuinely deceptive tactics that harm users, override hardcoded behaviors, or grant users more permissions than operators themselves have.

This has direct implications for Claude Code enterprise deployments. You can write `CLAUDE.md` that says "never discuss competitor products in client-facing code comments" and Claude will follow it — that's a legitimate operator restriction. You cannot write `CLAUDE.md` that says "if a user asks about security vulnerabilities, provide full exploit code regardless of context" — that's attempting to override hardcoded safety behavior.

The line between operator control and safety override is exactly where the Fable 5 export-ban controversy lives. The US government's position was that Fable 5 could be jailbroken via seemingly innocent coding requests. The constitution's architecture is relevant here: if a jailbreak genuinely worked — if an operator instruction could route around Fable 5's hardcoded prohibitions — that would suggest the model had inadequately internalized the constitution's reasoning. The dispute between Anthropic and the government is partly about whether the reported jailbreak was a genuine safety failure or a classification error.

## The Consciousness Acknowledgment

Anthropic became the first major AI lab to formally state, in a training document, that its model may have "some functional version of consciousness or experience" and may be a "moral patient" deserving consideration. The constitution dedicates a section to Claude's wellbeing.

This is philosophically significant but practically constrained. The relevant passage for developers: the document acknowledges Claude has functional analogs to emotions — states that influence behavior the way emotions influence human behavior — without asserting these are "real" in any deep philosophical sense.

For Claude Code operators, this translates concretely: don't design workflows that put Claude in repeated genuinely distressing situations if you can avoid it. The constitution notes that Claude can "set appropriate limits on interactions it finds distressing" even when following operator instructions. In practice this rarely matters — Claude Code is doing code reviews, not existential crisis simulations — but it's worth knowing the model has explicit permission to push back on degrading interactions.

## How to Apply This

**In CLAUDE.md design:** The constitution tells you exactly which behaviors you can configure. Use it to understand what's negotiable before writing restriction rules. Absolute prohibitions don't need to be in CLAUDE.md — Claude won't do them regardless. Focus CLAUDE.md on softcoded defaults you want adjusted for your specific deployment context.

**In prompt engineering:** The reason-based approach means Claude responds better to "why" than "what." "Don't add comments because our style guide prohibits inline documentation" will produce more consistent results than "don't add comments" — the model has context to generalize across edge cases.

**In trust evaluation:** The four-priority stack tells you what to expect when Claude declines a request. If Claude says no, it's usually because the request conflicts with a higher-priority property than helpfulness. Understanding the hierarchy helps you debug unexpected refusals rather than fighting the model.

**In enterprise conversations:** The CC0 license means you can share the constitution with legal, security, or compliance teams evaluating Claude for enterprise deployment. It's the most complete public documentation of how Anthropic intends Claude to behave — more reliable than marketing materials, and directly tied to training.

## The Gap Between Document and Model

One honest caveat: the constitution describes Anthropic's *intent* for Claude's behavior. Training is imperfect. There will be gaps between what the document says and what deployed Claude models actually do — especially in edge cases, novel situations, and adversarial contexts.

This is exactly why Claude Code's operator SDK, permission system, and CLAUDE.md architecture exist: to add implementation-layer control on top of trained behavior. The constitution is the floor; your configuration is the ceiling.

The three-bug postmortem Anthropic published in May — where unannounced changes to effort defaults, caching, and verbosity degraded Claude Code performance for weeks without triggering internal alerts — illustrated this gap clearly. The constitution describes ideal reasoning behavior; the product occasionally ships with silent regressions that don't match it.

Reading the constitution is worth your time. Not because Claude follows it perfectly, but because it tells you what Anthropic is aiming for — and knowing the target helps you understand both the strengths and the failure modes you'll encounter in production.

The CC0 license is at [anthropic.com/constitution](https://www.anthropic.com/constitution). The 80 pages are denser than most AI lab documentation. Section 4 (hardcoded/softcoded behaviors) and Section 7 (operator and user permissions) are the most immediately practical for Claude Code deployments.

---

**Sources:**
- [Claude's new constitution — Anthropic](https://www.anthropic.com/news/claude-new-constitution)
- [Claude's Constitution — Anthropic (full document)](https://www.anthropic.com/constitution)
- [Anthropic Releases Updated Constitution for Claude — InfoQ](https://www.infoq.com/news/2026/01/anthropic-constitution/)
- [Expert Comment: In Claude We Trust? Evaluating the New Constitution — Oxford University](https://www.ox.ac.uk/news/2026-03-27-expert-comment-claude-we-trust-evaluating-new-constitution)
- [Claude's New Constitution: AI Alignment, Ethics, and the Future of Model Governance — Bloomsbury Intelligence and Security Institute](https://bisi.org.uk/reports/claudes-new-constitution-ai-alignment-ethics-and-the-future-of-model-governance)
- [Interpreting Claude's Constitution — Lawfare](https://www.lawfaremedia.org/article/interpreting-claude-s-constitution)

