OpenAI Agents SDK April 2026: Sandbox Agents, Harness Architecture, Any LLM

Abhishek GautamAbhishek Gautam6 min read
OpenAI Agents SDK April 2026: Sandbox Agents, Harness Architecture, Any LLM

Quick summary

OpenAI Agents SDK April 15 2026 update: sandbox agents with filesystem/Git/snapshot support, harness-compute separation, Codex tools, 100+ LLMs via Chat Completions API. Full developer guide.

OpenAI updated the Agents SDK on April 15, 2026, with a release TechCrunch described as helping enterprises build "safer, more capable agents." The update ships three architectural changes significant enough to affect how you build production agent systems: sandbox agents (a new execution environment for long-horizon tasks), harness-compute separation (control plane decoupled from execution), and broad LLM compatibility (100+ models via Chat Completions API, not just OpenAI models). This is not a minor version bump — it changes the recommended architecture for production agentic systems.

The official documentation is at openai.github.io/openai-agents-python. This post covers what changed, what the architectural decisions mean for production deployments, and what developers need to test immediately.

Sandbox Agents: What They Are and Why They Matter

The sandbox agent is the headlining feature of the April 2026 update. A sandbox agent operates inside a persistent, isolated workspace with access to:

  • Filesystem: files, directories, and mounts — agents can read, write, and navigate a real filesystem inside the sandbox
  • Git repositories: agents can clone repos, create branches, commit changes, and push — enabling full autonomous development workflows
  • Snapshots: the sandbox state can be saved and restored, allowing agents to checkpoint work and resume after interruption
  • Containers: the sandbox can run inside Docker containers or hosted environments, with support for local, containerised, and cloud-hosted execution

Before this update, building an agent that could reliably work on a multi-step coding task — read a codebase, make changes across multiple files, run tests, fix failures, commit — required hand-rolling the filesystem management, state persistence, and container orchestration. The sandbox agent provides this as a first-class SDK primitive.

The direct comparison is to Anthropic's Computer Use and the emerging agentic coding layer in tools like Cursor and Claude Code. OpenAI is now competing directly in the agent execution environment space, not just the model API space.

What sandbox agents enable that wasn't practical before:

  • Long-horizon coding tasks that span multiple tool calls and file operations without state loss
  • Autonomous code review and refactoring agents that can run tests and verify their own changes
  • Documentation generation agents that read an entire codebase and produce output without context window limits
  • CI/CD agents that can diagnose failures, write fixes, and open PRs autonomously

Harness-Compute Separation: The Architecture Decision

The April 2026 SDK introduces the concept of separating the harness (control plane) from the compute (execution plane). This is an architectural principle that matters for production deployments.

Harness (control plane): The logic that orchestrates the agent — what task to run, which tools to call, how to handle errors, when to hand off to a human. The harness manages the agent's decision loop.

Compute (execution plane): The actual execution environment where the agent runs its tools — the sandbox, the filesystem, the container, the shell. The compute plane is where side effects happen.

In the previous SDK architecture, harness and compute were tightly coupled — the agent orchestration and the tool execution happened in the same process. The April 2026 update separates them, enabling:

  • Remote compute: run the execution environment on a different machine (or cloud instance) from the orchestration logic
  • Compute scaling: scale the execution environment independently of the orchestration layer
  • Audit and compliance: log all compute-layer actions independently of the orchestration — every filesystem read/write, every Git commit, every shell command is auditable at the execution plane level without modifying harness logic
  • Multi-agent coordination: multiple harness instances can share a compute environment, enabling agent collaboration on the same workspace

For enterprise deployments where security and auditability are requirements, harness-compute separation is the architecture you need. A financial services company building a code review agent, for example, can run the harness on its existing infrastructure and the compute in an isolated VPC with full audit logging — without changing the orchestration logic.

Codex-Style Filesystem Tools

The update ships with Codex-like filesystem tools as first-class SDK primitives. These include:

  • read_file / write_file / list_directory — basic filesystem access
  • run_command — execute shell commands in the sandbox
  • edit_file — apply targeted edits to specific file sections (not full rewrites)
  • search_files — semantic and literal search across a codebase
  • git_* tools — clone, branch, commit, diff, push operations

These are the same primitives that OpenAI's Codex uses internally. Making them available in the SDK means developers can build Codex-equivalent agents for their own codebases without accessing the Codex product specifically — you compose these tools with your own orchestration logic.

The developer implication: if you were hand-rolling filesystem tool implementations for your agent (reading files with custom functions, running subprocesses, managing git state), you can replace that with the SDK-native implementations which have been tested at OpenAI production scale.

Any LLM via Chat Completions API

The April 2026 SDK update makes the Agents SDK compatible with any model that exposes a Chat Completions-compatible API endpoint. The compatible model list now includes 100+ non-OpenAI models.

This includes:

  • Anthropic Claude (via the Anthropic API — it now exposes a Chat Completions-compatible endpoint)
  • Google Gemini (via the Gemini API)
  • Meta Llama 3.x (via Groq, Fireworks, Together AI, or self-hosted)
  • Mistral, Qwen, DeepSeek V4 Pro (via Mistral API, QwenAPI, or self-hosted)
  • Any model hosted on Azure OpenAI Service (including non-OpenAI models via Azure AI Foundry)

What this means in practice:

  • You can build an Agents SDK orchestration that uses Claude for reasoning, GPT-4o for tool calls, and a local Llama model for document processing — mixing models based on cost, capability, or compliance requirements
  • You can use the Agents SDK for production deployments where OpenAI models are blocked by compliance policy (some EU financial institutions cannot use US-hosted AI APIs) by routing to Azure-hosted models or self-hosted alternatives
  • You can benchmark your agent system across models without rewriting orchestration logic — swap the model, keep the harness

The DeepSeek V4 Huawei angle is relevant here: DeepSeek V4 Pro, open-source with public weights, is a Chat Completions-compatible model. You can run it self-hosted on Huawei hardware (in China) or on any GPU infrastructure and use it with the OpenAI Agents SDK orchestration layer. The SDK becomes model-agnostic infrastructure.

Python Now, TypeScript Later

The new harness and sandbox capabilities ship in Python first. TypeScript support is planned for a later release date not yet announced.

The implications for your stack:

  • If you are building agent systems in Python, the April 2026 update is immediately available — upgrade the openai-agents package and start using sandbox agents
  • If you are building in TypeScript/Node.js, the existing SDK capabilities are available but sandbox agents and harness-compute separation are not yet in the TypeScript package — plan for migration or maintain a Python orchestration layer until TypeScript support ships

Pricing

The April 2026 Agents SDK update uses standard API pricing. Sandbox agent execution does not carry a separate fee above model API calls — you pay for token usage (input/output) at your standard tier rates. Shell command execution, filesystem operations, and Git operations within the sandbox are free.

The compute cost to watch is the GPU/compute cost if you run self-hosted models. The SDK itself is free; the hosted sandbox environments OpenAI provides are billed at standard API rates; running a 1.6T parameter DeepSeek model yourself to use with the SDK is where your GPU bill comes from.

What to Test First

If you have an existing Agents SDK integration:

  1. Upgrade: run "pip install --upgrade openai-agents"
  2. Check the release changelog for breaking changes in harness API
  3. If you have custom filesystem tools, compare them against SDK-native implementations and migrate if the SDK version covers your use case

For new agent builds:

  1. Start with the sandbox agent examples in the official docs
  2. Test the harness-compute separation with a remote compute setup if you have compliance or audit requirements
  3. Test the any-LLM compatibility if your deployment has model restrictions

For enterprise teams evaluating agentic coding:

The April 2026 SDK is the first version where building a production coding agent with the OpenAI SDK is architecturally sound rather than a prototype. The sandbox persistence, harness auditability, and Codex filesystem tools address the three main production blockers from previous versions.

Key Takeaways

  • Sandbox agents: persistent isolated workspace with filesystem, Git, snapshots, container support — enables long-horizon coding tasks without hand-rolling state management
  • Harness-compute separation: control plane decoupled from execution; enables remote compute, audit logging, multi-agent coordination, and compliance-grade deployments
  • Codex filesystem tools: read_file, write_file, run_command, edit_file, git_* — same primitives Codex uses, now in the SDK
  • 100+ LLMs supported: any Chat Completions-compatible API — Claude, Gemini, Llama, DeepSeek V4 Pro, self-hosted models all work with OpenAI Agents SDK orchestration
  • Python now, TypeScript later: sandbox and harness features in Python package immediately; TypeScript timeline not announced
  • Standard pricing: no additional fee for sandbox execution; pay standard token rates for model API calls

Official documentation: openai.github.io/openai-agents-python

For the DeepSeek V4 Pro model that now works with this SDK, read DeepSeek V4 Pro: 1.6T Parameters, Beats Claude on Coding. For the broader agentic coding context, read OpenAI GPT-5.5: Agentic Coding Upgrade. For the AI model landscape this SDK now supports, read Stanford HAI 2026: China Erased 97% of US AI Lead.

FAQ

Frequently Asked Questions

What is the OpenAI Agents SDK and what changed in the April 2026 update?

The OpenAI Agents SDK is the official Python and TypeScript library for building production AI agent systems using OpenAI models and tools. The April 15, 2026 update introduced three major features: sandbox agents (persistent isolated execution environments with filesystem, Git, and container support for long-horizon tasks), harness-compute separation (decoupled control plane and execution plane for auditable enterprise deployments), and compatibility with 100+ non-OpenAI LLMs via Chat Completions-compatible API endpoints. The update also ships Codex-style filesystem tools as first-class SDK primitives. Official documentation: openai.github.io/openai-agents-python

What are OpenAI Agents SDK sandbox agents and how do they work?

Sandbox agents in the April 2026 OpenAI Agents SDK update are agents that execute inside a persistent, isolated workspace. The sandbox provides filesystem access (read, write, list directories), Git repository operations (clone, branch, commit, push), snapshot support (save and restore sandbox state for resumable long-horizon tasks), and container support (run inside Docker or hosted environments). Before this update, building an agent that could reliably work across multiple files and maintain state between tool calls required custom infrastructure. The sandbox provides this as a first-class SDK primitive — the same execution environment OpenAI's Codex product uses internally.

Can I use Claude, Gemini, or DeepSeek with the OpenAI Agents SDK?

Yes. The April 2026 update makes the OpenAI Agents SDK compatible with any model that exposes a Chat Completions-compatible API endpoint — over 100 non-OpenAI models are now supported. This includes Anthropic Claude (via Anthropic API's Chat Completions endpoint), Google Gemini (via Gemini API), Meta Llama 3.x (via Groq, Fireworks, Together AI, or self-hosted), Mistral, Qwen, DeepSeek V4 Pro (self-hosted or via third-party APIs), and any model on Azure OpenAI Service or Azure AI Foundry. You can mix models in a single agent system — different models for different tools or reasoning steps — without rewriting orchestration logic.

What is harness-compute separation in the OpenAI Agents SDK?

Harness-compute separation is an architectural pattern introduced in the April 2026 SDK update that decouples the agent's orchestration logic (harness/control plane) from its execution environment (compute/execution plane). The harness manages the agent's decision loop — which tasks to run, which tools to call, how to handle errors. The compute plane is where tools actually execute — filesystem operations, shell commands, Git operations. Separating them enables running the execution environment on different infrastructure from the orchestration, independent scaling of each layer, compliance-grade audit logging of all compute-layer actions without modifying orchestration code, and multi-agent coordination where multiple harnesses share one compute environment.

Free Weekly Briefing

The AI & Dev Briefing

One honest email a week — what actually matters in AI and software engineering. No noise, no sponsored content. Read by developers across 30+ countries.

No spam. Unsubscribe anytime.

Free Tool

Will AI replace your job?

4 questions. Get a personalised developer risk score based on your stack, role, and what you actually build day to day.

Check Your AI Risk Score →

Written by

Software Engineer based in Delhi, India. Writes about AI models, semiconductor supply chains, and tech geopolitics — covering the intersection of infrastructure and global events. 859+ posts cited by ChatGPT, Perplexity, and Gemini. Read in 167 countries.