OpenAI's Agents SDK Gets Sandboxed Execution and a Model-Native Harness: The Agent Infrastructure Layer Is Now Table Stakes

Table of Contents

On April 15, OpenAI shipped what might be the most consequential developer tooling update it has released in 2026: a major overhaul of its Agents SDK that adds sandboxed execution environments, a model-native harness, and durable state management via snapshotting. It also works with any model, not just OpenAI’s.

Let’s unpack what shipped, why it matters, and what it reveals about where the agentic infrastructure market is heading.

What Actually Shipped
#

A Model-Native Harness
#

The new harness is the architectural centerpiece of the update. It gives agents:

Configurable memory — agents can persist state across tool calls and sessions in a structured way
Sandbox-aware orchestration — the harness knows what execution environment the agent is running in and coordinates accordingly
Codex-like filesystem tools — agents can inspect, read, write, and manipulate files as a first-class operation
Standardized integrations — a common interface for the primitives that are converging across frontier agent systems

The language in OpenAI’s announcement is precise: this is “alignment with how frontier models perform best.” That’s a concession that generic orchestration frameworks misfire on complex, long-running tasks. The harness is opinionated in the right direction — toward how large reasoning models actually behave in practice.

Native Sandbox Execution
#

The second major addition is native sandbox support. Agents can now run in controlled compute environments with the files, tools, and dependencies they need for a task. Developers have two paths:

Bring your own sandbox — if you already run E2B, Modal, Runloop, or another compute platform
Use a built-in integration — support for Blaxel, Cloudflare, Daytona, E2B, Modal, Runloop, and Vercel is wired in from day one

A new Manifest abstraction describes the agent’s workspace in a portable, provider-agnostic format. Swap sandbox providers without rewriting orchestration logic.

Security by Separation
#

This is the detail that enterprise security teams will care about. OpenAI explicitly designed the new architecture assuming prompt-injection and data exfiltration attempts will happen. Their solution: separate the harness from the compute environment.

Credentials never enter the environment where model-generated code executes. The harness side holds auth context; the sandbox side is treated as potentially hostile territory. This is a genuinely sound security model for production agent deployments — particularly for agents that touch databases, secrets managers, or production APIs.

The separation also enables durable execution: if a sandbox container fails or times out, the SDK snapshotshots agent state and rehydrates it in a fresh container. Long-running agentic tasks can survive infrastructure blips.

Provider-Agnostic by Default
#

Here’s the move most people buried: the updated Agents SDK works with any model that exposes a Chat Completions-compatible API endpoint. That’s over 100 models from third-party and open-source providers — including Anthropic’s Claude.

OpenAI built an agent orchestration layer that will run Claude. Let that land for a moment.

This is either an extremely confident play (“our models win on merit, so let developers compare”) or an ecosystem land-grab (“we become the standard SDK regardless of which model wins”). Probably both.

Coming soon: subagents (hierarchical multi-agent patterns, Python and TypeScript) and code mode (agents that write and execute code as a native workflow step). Python ships first; TypeScript follows.

The Primitives Are Now Table Stakes
#

The most interesting thing about this release is what it confirms about the direction of the entire market.

Everything OpenAI shipped here — sandboxed execution, configurable memory, filesystem tooling, durable state, harness/compute separation — has been part of Claude Code’s architecture since its initial design. Not as optional add-ons, but as core premises of how the tool was built.

This is what happens when a competitor moves from “AI assistant in your editor” to “agent that runs things autonomously.” The infrastructure requirements are identical regardless of which tool you pick. You need isolation (so runaway agents don’t destroy your repo). You need memory (so agents don’t repeat work). You need durable state (so long tasks survive restarts). You need security separation (so model-generated code can’t exfiltrate credentials).

OpenAI shipping this as an explicit SDK layer is validation that these requirements are non-negotiable for serious agentic workflows. It’s also a signal that the tooling layer — not just the model layer — is now a primary competitive surface.

Modular vs. Integrated: The Real Architectural Tradeoff
#

Claude Code and the new OpenAI Agents SDK embody different philosophies about how agent infrastructure should be delivered.

OpenAI’s approach: a modular SDK that you wire together. You choose your sandbox provider. You configure your harness. You bring your model. You assemble the pieces. This gives you flexibility — you can use the OpenAI SDK with Claude, or with a local Llama model, or with GPT-5. You own the architecture.

Claude Code’s approach: an opinionated, integrated system where the terminal, the agent runtime, the worktree isolation, the memory, and the cloud execution layer (Routines, Ultraplan) are designed to work together. You don’t configure; you use.

Which is better depends on your context. If you’re a platform team building a proprietary agent infrastructure at scale, the SDK model gives you the control you need. You can adapt it to your security posture, your cloud provider, your model strategy.

If you’re a developer or a small team trying to ship software faster, the integrated model wins every time. The operational overhead of selecting sandbox providers, wiring the harness, managing manifests, and testing security separation across your configuration is not “overhead you do once and forget.” It’s ongoing maintenance. Claude Code’s premise — that you should spend zero time thinking about agent infrastructure — is still the right one for the majority of use cases.

The SDK approach has a deeper problem: it converts agent infrastructure into an engineering project. Claude Code converts it into a tool you install and use.

What’s Missing
#

The new Agents SDK still lacks what Claude Code Routines provides natively: agents that run on the vendor’s infrastructure, on a schedule, without your machine being online. OpenAI’s sandboxed agents run in your chosen compute environment. You manage the compute; you pay for the compute; you maintain the sandbox integrations.

That’s fine for teams with dedicated infrastructure. It’s friction for individual developers or small teams who want agentic automation that just runs — the way a CI pipeline runs, not the way a self-hosted server runs.

The subagents feature (coming soon) is the other gap to watch. Multi-agent orchestration is where most of the interesting long-horizon coding work happens. Until subagents ship and prove out, the Agents SDK orchestration story is still single-threaded relative to what Claude Code’s team mode delivers today.

The Broader Picture
#

OpenAI’s Agents SDK update is good for developers and good for the ecosystem. More mature tooling, real security models, and provider-agnostic architecture lift all boats. If the SDK becomes a de facto standard for building agents, it reduces the proprietary lock-in risk for teams concerned about betting entirely on one provider’s tooling.

But the update also clarifies the competitive landscape rather than disrupting it. The primitives Claude Code pioneered are now being standardized. That’s validation. The question going forward isn’t whether agentic infrastructure needs sandboxing, memory, and durable state — everyone now agrees it does. The question is whether you want those primitives assembled by you (SDK) or delivered as an integrated system (Claude Code).

For most teams, the integrated answer is still the right one. The best infrastructure is the infrastructure you never have to think about.

Sources:

What Actually Shipped#

A Model-Native Harness#

Native Sandbox Execution#

Security by Separation#

Provider-Agnostic by Default#

The Primitives Are Now Table Stakes#

Modular vs. Integrated: The Real Architectural Tradeoff#

What’s Missing#

The Broader Picture#

Related