Sandboxing

Habitat containment for AI agents—the boundaries, barriers, and isolation techniques that limit agent reach and protect both systems and agents from harm.

In ecology, certain environments are naturally bounded—islands, oases, tide pools. These boundaries shape what can enter, what can leave, and what can thrive within. The boundary itself is part of the ecosystem.

Sandboxing is the deliberate creation of such boundaries for AI agents—controlled environments that limit what agents can access, modify, and affect. It’s the practice of habitat containment.

Why Sandboxing Matters

Agents are powerful. That power cuts both ways.

┌─────────────────────────────────────────────────────────┐
│                     UNSANDBOXED                          │
├─────────────────────────────────────────────────────────┤
│                                                          │
│   Agent can:                                             │
│   • Access any file on the system                        │
│   • Execute arbitrary code                               │
│   • Make network requests anywhere                       │
│   • Modify system configuration                          │
│   • Access credentials and secrets                       │
│                                                          │
│   Risk: A compromised or misbehaving agent can           │
│         cause unbounded damage                           │
│                                                          │
├─────────────────────────────────────────────────────────┤
│                      SANDBOXED                           │
├─────────────────────────────────────────────────────────┤
│                                                          │
│   Agent can:                                             │
│   • Access only designated files                         │
│   • Execute only allowed operations                      │
│   • Connect only to approved endpoints                   │
│   • Operate within resource limits                       │
│   • See only sanitized information                       │
│                                                          │
│   Risk: Bounded by sandbox design                        │
│                                                          │
└─────────────────────────────────────────────────────────┘
The sandboxing imperative

The Anthropological Frame

Human societies have developed analogous containment mechanisms:

Human SocietyAgent Sandboxing
Quarantine zonesNetwork isolation
Security clearancesPermission systems
Probationary periodsStaged autonomy
Constitutional limitsCapability constraints
Audit requirementsLogging and monitoring

These aren’t signs of dysfunction—they’re how complex systems manage risk while enabling action.

Sandboxing Dimensions

Agent sandboxes restrict along multiple dimensions:

┌─────────────────────────────────────────────────────────┐
│                   SANDBOX DIMENSIONS                     │
├─────────────────────────────────────────────────────────┤
│                                                          │
│  FILESYSTEM        │ What files can be read/written?    │
│  ──────────────────┼───────────────────────────────────│
│                    │ • Read-only access                 │
│                    │ • Directory restrictions           │
│                    │ • File type limits                 │
│                    │                                    │
│  NETWORK           │ What connections are allowed?      │
│  ──────────────────┼───────────────────────────────────│
│                    │ • Allowlisted domains              │
│                    │ • No outbound access               │
│                    │ • Proxied/logged connections       │
│                    │                                    │
│  EXECUTION         │ What code can run?                 │
│  ──────────────────┼───────────────────────────────────│
│                    │ • Allowed languages/runtimes       │
│                    │ • Prohibited operations            │
│                    │ • Resource limits (CPU, memory)    │
│                    │                                    │
│  DATA              │ What information is visible?       │
│  ──────────────────┼───────────────────────────────────│
│                    │ • Credential masking               │
│                    │ • PII filtering                    │
│                    │ • Context limitations              │
│                    │                                    │
│  TIME              │ How long can it run?               │
│  ──────────────────┼───────────────────────────────────│
│                    │ • Execution timeouts               │
│                    │ • Session limits                   │
│                    │ • Rate limiting                    │
│                                                          │
└─────────────────────────────────────────────────────────┘
Dimensions of sandboxing

Filesystem Sandboxing

Restricting what files the agent can see and modify.

Techniques:

  • Chroot jails (Unix)
  • Container isolation (Docker)
  • Virtual file systems
  • Path allowlists/denylists

Considerations:

  • Agents often need some file access to be useful
  • Overly restrictive limits reduce capability
  • Sensitive files (credentials, configs) need extra protection

Network Sandboxing

Controlling what network resources the agent can reach.

Techniques:

  • Firewall rules
  • Network namespaces
  • Proxy servers with filtering
  • DNS restrictions

Considerations:

  • Many agent tasks require network access (search, APIs)
  • Outbound data exfiltration is a key risk
  • Proxies enable logging and inspection

Execution Sandboxing

Limiting what code the agent can run and how.

Techniques:

  • Containerization
  • Seccomp profiles
  • Language-level restrictions
  • Interpreter sandboxes (WebAssembly)

Considerations:

  • Code execution is both powerful and dangerous
  • Resource limits prevent denial-of-service
  • Some operations (syscalls) should be blocked entirely

Data Sandboxing

Controlling what information the agent can access.

Techniques:

  • Credential vaulting (secrets never in context)
  • PII detection and masking
  • Context window filtering
  • Output sanitization

Considerations:

  • Agents need information to be useful
  • Sensitive data leakage through outputs is a risk
  • Prompt injection can attempt to exfiltrate data

Temporal Sandboxing

Limiting how long and how often the agent can act.

Techniques:

  • Execution timeouts
  • Rate limiting
  • Session duration limits
  • Cool-down periods

Considerations:

  • Long-running tasks may legitimately need time
  • Infinite loops can consume unbounded resources
  • Rate limits prevent abuse while allowing legitimate use

Sandbox Architectures

The Strict Sandbox

Maximum isolation. The agent can only do exactly what’s explicitly allowed.

┌─────────────────────────────────────────────────────────┐
│                                                          │
│    ┌─────────────────────────────────────────────────┐  │
│    │                   SANDBOX                        │  │
│    │  ┌─────────────────────────────────────────┐   │  │
│    │  │                                          │   │  │
│    │  │                 AGENT                    │   │  │
│    │  │                                          │   │  │
│    │  └──────────────────┬───────────────────────┘   │  │
│    │                     │                           │  │
│    │            ┌────────▼────────┐                  │  │
│    │            │    ALLOWLIST    │                  │  │
│    │            │   (explicit     │                  │  │
│    │            │   permissions)  │                  │  │
│    │            └────────┬────────┘                  │  │
│    │                     │                           │  │
│    └─────────────────────┼───────────────────────────┘  │
│                          │                              │
│              Only allowlisted actions pass              │
│                          │                              │
│                          ▼                              │
│                    OUTSIDE WORLD                        │
│                                                          │
└─────────────────────────────────────────────────────────┘
Strict sandbox architecture

Pros: Maximum safety, minimal attack surface Cons: Limited capability, high configuration overhead

The Permissive Sandbox

Default allow with specific restrictions. The agent can do anything except what’s explicitly blocked.

┌─────────────────────────────────────────────────────────┐
│                                                          │
│    ┌─────────────────────────────────────────────────┐  │
│    │                   SANDBOX                        │  │
│    │  ┌─────────────────────────────────────────┐   │  │
│    │  │                                          │   │  │
│    │  │                 AGENT                    │   │  │
│    │  │                                          │   │  │
│    │  └──────────────────┬───────────────────────┘   │  │
│    │                     │                           │  │
│    │            ┌────────▼────────┐                  │  │
│    │            │    DENYLIST     │                  │  │
│    │            │   (blocked      │                  │  │
│    │            │   operations)   │                  │  │
│    │            └────────┬────────┘                  │  │
│    │                     │                           │  │
│    └─────────────────────┼───────────────────────────┘  │
│                          │                              │
│           Everything except denylisted passes           │
│                          │                              │
│                          ▼                              │
│                    OUTSIDE WORLD                        │
│                                                          │
└─────────────────────────────────────────────────────────┘
Permissive sandbox architecture

Pros: Maximum capability, easier configuration Cons: Larger attack surface, unknown risks

The Layered Sandbox

Multiple nested boundaries. Actions must pass through several checkpoints.

┌─────────────────────────────────────────────────────────┐
│                                                          │
│  ┌───────────────────────────────────────────────────┐  │
│  │  OUTER SANDBOX (coarse restrictions)              │  │
│  │  ┌─────────────────────────────────────────────┐ │  │
│  │  │  MIDDLE SANDBOX (medium restrictions)       │ │  │
│  │  │  ┌───────────────────────────────────────┐ │ │  │
│  │  │  │  INNER SANDBOX (fine restrictions)    │ │ │  │
│  │  │  │  ┌─────────────────────────────────┐ │ │ │  │
│  │  │  │  │                                 │ │ │ │  │
│  │  │  │  │            AGENT                │ │ │ │  │
│  │  │  │  │                                 │ │ │ │  │
│  │  │  │  └─────────────────────────────────┘ │ │ │  │
│  │  │  └───────────────────────────────────────┘ │ │  │
│  │  └─────────────────────────────────────────────┘ │  │
│  └───────────────────────────────────────────────────┘  │
│                                                          │
│  Defense in depth: multiple barriers to breach          │
│                                                          │
└─────────────────────────────────────────────────────────┘
Layered sandbox architecture

Pros: Defense in depth, graceful degradation Cons: Complexity, potential performance overhead

The Sandbox-Capability Tradeoff

Tighter sandboxes mean less risk but also less capability:

                       High Capability
                            │
                            │    ● No sandbox
                            │       (maximum risk)
                            │
                            │         ● Permissive
                            │            sandbox
                            │
                            │              ● Layered
                            │                 sandbox
                            │
                            │                   ● Strict
                            │                      sandbox
                            │
                            │                        ● Pure
                            │                           isolation
                            │                           (no capability)
 Low Risk ──────────────────┼────────────────────── High Risk
                            │
                            │
                       Low Capability
The fundamental tradeoff

The art is finding the right balance for your use case.

Use CaseSandbox LevelRationale
Production backendStrictHigh stakes, predictable tasks
Development assistantMediumNeeds access, lower stakes
Research sandboxPermissiveExploration valued over safety
Public-facing chatStrictAdversarial users expected

Implementation Patterns

Container-Based Sandboxing

Using Docker or similar to isolate agent execution:

┌─────────────────────────────────────┐
│           HOST SYSTEM               │
│                                     │
│  ┌───────────────────────────────┐  │
│  │      CONTAINER                │  │
│  │  • Isolated filesystem        │  │
│  │  • Limited network            │  │
│  │  • Resource constraints       │  │
│  │  • Minimal privileges         │  │
│  │                               │  │
│  │        [AGENT]                │  │
│  └───────────────────────────────┘  │
│                                     │
└─────────────────────────────────────┘

Tools: Docker, Podman, gVisor, Firecracker

Language-Level Sandboxing

Restricting at the interpreter/runtime level:

  • Python: RestrictedPython, limited builtins
  • JavaScript: vm2, isolated-vm
  • WebAssembly: inherently sandboxed

Limitation: Harder to enforce, easier to bypass

API-Level Sandboxing

The agent only interacts through a controlled API:

Agent → API Gateway → Validated Actions → System

         └── Logging, filtering, rate limiting

This is common for agents-as-a-service, where the sandbox is the API itself.

Sandbox Escape

Sandbox escape is when an agent (or attacker through an agent) breaks out of containment.

Escape Vectors

VectorDescription
Vulnerability exploitationBugs in sandbox implementation
Resource exhaustionOverwhelming the sandbox
Side channelsExtracting info through timing, etc.
Social engineeringConvincing humans to weaken sandbox
Prompt injectionManipulating agent to attempt escape

Defense Strategies

  1. Defense in depth: Multiple independent boundaries
  2. Principle of least privilege: Only grant necessary permissions
  3. Regular updates: Patch known vulnerabilities
  4. Monitoring: Detect escape attempts
  5. Assume breach: Design for containment failure

Sandboxing and Trust

Sandboxing relates directly to trust levels:

Trust LevelSandbox Approach
No trustStrict isolation, read-only, no network
Low trustTight sandbox, allowlisted actions
Medium trustModerate sandbox, denylisted dangers
High trustLight sandbox, mostly monitoring
Full trustNo sandbox (rarely appropriate)

Trust should be earned through demonstrated safe behavior, not assumed.

Dynamic Sandboxing

Advanced systems adjust sandbox boundaries based on context:

  • Task-based: Different limits for different operations
  • Confidence-based: Tighter sandbox when agent is uncertain
  • Escalation-based: Human approval expands sandbox temporarily
  • Reputation-based: Track record influences permissions

This mirrors human society: we grant more freedom to those who’ve demonstrated responsibility.

The Future of Sandboxing

As agents become more capable, sandboxing must evolve:

  • Semantic sandboxing: Restricting based on intent, not just mechanics
  • Formal verification: Mathematically proving sandbox properties
  • Self-sandboxing: Agents that understand and respect their own limits
  • Distributed sandboxing: Containment across multi-agent systems

The challenge: maintaining useful capability while managing increasing power.

See Also