Legal AI company Harvey had a problem that every team running autonomous agents eventually hits: the agent kept making the same mistakes. A workaround that a human had corrected in session 47 was gone by session 48. A tool usage pattern that the team had optimized was invisible to the next run. Institutional knowledge evaporated every time the context window closed.
After deploying Anthropic’s new Dreaming feature for Claude Managed Agents, Harvey’s task completion rate increased roughly 6×. That is not a modest improvement to a success metric — it is a structural shift in what the agent is capable of, compounding across every subsequent run.
Dreaming shipped on May 6 at Anthropic’s Code with Claude SF event, alongside two related features: Outcomes (rubric-based success evaluation) and Multiagent Orchestration (a coordinator/specialist model for decomposing large tasks). Together they move Claude Managed Agents from a capable agent loop to something closer to a continuously improving system.
What Dreaming Actually Does#
Memory — launched earlier this year in public beta — stores facts across sessions: preferences, project state, file structures, user-specific context. It is a notepad the agent can write to and read from.
Dreaming operates at a higher level of abstraction. It is a scheduled background process that reviews the agent’s past sessions and memory stores, extracts patterns across them, and curates memories so the agent improves over time. It surfaces things a single session cannot see on its own: recurring mistakes across dozens of runs, workflows the agent consistently converges on, preferences that show up across a team’s sessions.
When a dream runs, it reads the existing memory store alongside past session transcripts, then produces a new, reorganized memory store:
- Duplicate entries merged into single canonical facts
- Stale or contradicted knowledge replaced with the latest validated value
- New patterns surfaced as explicit learnings the next session will act on
The output is written as plain-text notes and structured playbooks — not embedded weights, not opaque vectors. Every insight is readable, auditable, and correctable by a human. You can see exactly what the agent learned and why.
For Harvey’s legal agents — handling long-form drafting, multi-document review, and complex legal research — the patterns Dreaming surfaced included filetype workarounds, tool-specific usage sequences, and recurring failure modes that no single session had enough context to detect. The agents now arrive at each session pre-loaded with the institutional knowledge the team had built up, rather than starting from scratch.
Control, Not Autopilot#
One important design decision: Dreaming does not have to be automatic.
You decide how much control you want. The two modes:
- Auto-update: Dreaming runs on a schedule, updates memory, and the next session benefits immediately. Low friction, suited for established agents with validated memory stores.
- Manual review: Dreaming produces a candidate memory update. A human reviews the proposed changes before they land. Higher friction, appropriate during the early stages when you are still calibrating what the agent should and should not retain.
This is a meaningful distinction. Automatic memory updates in a production agentic system carry real risk — a bad dream could propagate a systematic mistake at scale. The manual review mode gives teams an audit layer, which matters especially in regulated industries (where Harvey operates) where the agent’s reasoning chain needs to be defensible.
The schedule configuration is via the Managed Agents API; the Dreams API docs cover the YAML structure and available trigger windows.
Outcomes: Defining Done Before the Agent Starts#
The second feature — Outcomes — solves a different problem. Most agentic workflows define a task, run the agent, and then evaluate the result by eye. That works fine for a demo. It does not scale to production agents running thousands of tasks per week.
Outcomes lets you write a rubric describing what success looks like before the agent starts. The agent then works toward that rubric. When it finishes, a separate grader evaluates the output against your criteria — in its own context window, so it is not influenced by the agent’s reasoning or any in-session confirmation bias.
When the grader finds the output does not meet the rubric, it pinpoints what needs to change and sends the agent back for another pass. You define the success criteria once; the agent iterates until it meets them.
You can also add a webhook: POST /sessions/{id}/outcome notifies your system when the agent completes (or exhausts its retry budget). No polling. The agent runs, the webhook fires, you process the result.
This is a production pattern, not a convenience feature. Teams that are building SLA-bound agentic workflows — “this task must meet these criteria before it ships” — have had to implement this evaluation loop themselves until now. Outcomes brings it into the managed layer.
Multiagent Orchestration: Coordinator + Specialists#
The third feature is the most architecturally interesting. When a task is too large or too heterogeneous for a single agent to handle well, Multiagent Orchestration lets a lead agent break the job into pieces and delegate each to a specialist with its own model, prompt, and tools.
The canonical example from Anthropic’s documentation is an incident investigation: a lead agent runs the overall investigation while subagents fan out in parallel through deploy history, error logs, metrics dashboards, and support tickets. The specialists work simultaneously on a shared filesystem and contribute findings to the lead agent’s overall context.
Key constraints:
- Up to 20 unique agents per multiagent session (the coordinator plus up to 19 specialists)
- Each specialist can have its own model, system prompt, and tool set — you are not restricted to a homogeneous fleet
- Specialists write to a shared filesystem; the coordinator aggregates
- No separate access request required; available via the Claude Platform API with the
managed-agents-2026-04-01beta header
The 20-agent limit is not a technical ceiling — it is a guardrail while the feature is in public beta. Anthropic will almost certainly adjust it based on what production deployments actually need.
Why This Matters Beyond the Headline#
The Harvey 6× number is striking, but the structural implication is bigger than any single metric.
Until now, Claude Managed Agents was a capable, stateful loop. You got persistence across sessions (memory), sandboxed execution, checkpointing, and solid infrastructure. What you did not get was an agent that could learn from its own history in a systematic, auditable way — or a mechanism to define and enforce success criteria automatically.
Dreaming + Outcomes + Multiagent Orchestration closes three of the main gaps between “capable prototype” and “production-grade autonomous system”:
| Gap | Feature that closes it |
|---|---|
| Agent forgets what it learned | Dreaming: scheduled memory curation |
| Success is defined by eyeballing | Outcomes: explicit rubric + grader loop |
| Single agent can’t handle complex parallel work | Multiagent: coordinator + specialist fleet |
Together they describe a system that can improve over time (Dreaming), enforce its own quality bar (Outcomes), and scale its capacity horizontally (Multiagent) — without requiring a human in the loop for each iteration.
How to Think About It in Practice#
If you are running Claude Managed Agents today, the sequencing that makes sense:
Start with Outcomes if you have a production workflow with a definable success criterion. This is the lowest risk addition — you are just specifying what “done” means and letting the agent iterate toward it, with webhook notification when it gets there.
Add Dreaming once you have enough session history for it to be useful — typically after the agent has run 20-50 sessions with meaningful memory writes. Enable manual review mode first. Review a few dream outputs by hand before switching to auto-update.
Add Multiagent Orchestration when a single agent is hitting token or time limits on a given class of tasks, or when you have genuinely parallel workstreams that benefit from specialization (legal research + document drafting + citation verification can run simultaneously rather than sequentially).
Combine all three for the full flywheel: multiagent sessions generate more session data for Dreaming to work with; Dreaming curates learnings that improve the specialists; Outcomes catches regressions before they compound.
The Bigger Picture#
Dreaming is Anthropic’s answer to a question the industry has been asking for a year: how do you make an agentic system that actually gets better over time, without retraining the model?
The answer is not fine-tuning and not RL on production traffic — both are expensive, opaque, and hard to govern. It is a scheduled background process that reads past behavior, synthesizes it into plain-text learnings, and writes those learnings back to the memory store that the next session reads. Observable, auditable, reversible.
That is the design philosophy Claude Code has always taken: terminal-native, filesystem-transparent, inspectable at every step. Dreaming extends the same philosophy up the stack to the agent memory layer.
Harvey’s 6× completion rate will not be universal — it reflects a specific agent architecture (long-form legal drafting, complex tool chains, high session volume) where memory curation pays off quickly. Your numbers will depend on your task structure and session volume. But the direction is clear: agents that can learn from their own history are not just more convenient — they are fundamentally more capable in ways that compound.
The question is not whether to use Dreaming. The question is how quickly you can accumulate enough session history to make it valuable.
Sources:
- New in Claude Managed Agents: dreaming, outcomes, and multiagent orchestration — Anthropic
- Dreams — Claude API Docs
- Claude Managed Agents overview — Claude API Docs
- Anthropic introduces “dreaming,” a system that lets AI agents learn from their own mistakes — VentureBeat
- Anthropic is letting Claude agents ‘dream’ so they don’t sleep on the job — SiliconANGLE
- Anthropic will let its managed agents dream — The New Stack
- Claude agents can now dream — XDA Developers