AI Sandbox: Giving Claude a Safe Container to Debug Your Apps

There is a fundamental problem with AI-assisted DevOps: the AI cannot touch anything. It can read your logs. It can check your metrics. It can tell you "the app is probably crashing because of a missing environment variable." But it cannot actually verify that hypothesis. It cannot curl your app's health endpoint. It cannot inspect the filesystem inside your container. It cannot run npm ls to check for dependency conflicts.

We decided to fix that. On March 25, 2026, we built Phase 5 of the MCP server: an AI sandbox container that gives Claude root access to a real Linux environment, connected to your application's network, with pre-installed debugging tools. Not a simulation. Not a read-only view. A writable Alpine container where Claude can install packages, clone repositories, build code, and run commands -- with just enough guardrails to prevent it from destroying the host.

The Design Decision: Full Access, Minimal Blocklist

The CEO directive was clear: Claude needs to install packages, clone repos, build and run apps. A read-only sandbox would defeat the purpose. If Claude suspects a Python dependency conflict, it should be able to pip install the suspect package and reproduce the error. If it thinks a network route is broken, it should be able to curl the endpoint and show the actual response.

We chose a minimal blocklist philosophy rather than an allowlist. The sandbox blocks exactly four categories of commands:

rm -rf / -- recursive deletion of the filesystem root
mkfs -- formatting drives
shutdown and reboot -- halting the host (via shared kernel)
Fork bombs -- :(){ :|:& };: and variants

Everything else is permitted. This is a deliberate choice. An allowlist would constantly need updating as Claude discovers new debugging techniques. The blocklist only needs to prevent actions that could damage the host or other containers.

rust// sandbox.rs -- Command validation
const BLOCKED_PATTERNS: &[&str] = &[
    "rm -rf /",
    "rm -rf /*",
    "mkfs",
    "shutdown",
    "reboot",
    ":(){ :|:&",  // fork bomb
    "./$0|./$0&", // fork bomb variant
];

pub fn validate_command(cmd: &str) -> Result<(), String> {
    let normalized = cmd.to_lowercase().replace("  ", " ");
    for pattern in BLOCKED_PATTERNS {
        if normalized.contains(pattern) {
            return Err(format!("Command blocked: contains prohibited pattern '{}'", pattern));
        }
    }
    Ok(())
}

The Container Spec

Each sandbox is an Alpine 3.19 container with the following configuration:

Property	Value	Rationale
Base image	Alpine 3.19	Small (5 MB), fast to pull, comprehensive package manager
Memory	1 GB	Enough for `npm install`, Python builds, Go compilation
CPU	2 cores	Enough for parallel builds without starving the host
User	root	Required for `apk add`, package installation, filesystem access
Timeout	5 minutes per command	Covers `git clone`, `npm install`, and compilation
Output limit	100 KB	Enough for code analysis output, prevents context window overflow
Network	`container:{app_id}` mode	Shares localhost with the app container

The pre-installed packages cover the most common debugging scenarios:

curl wget           -- HTTP requests
bind-tools          -- DNS diagnostics (dig, nslookup)
netcat-openbsd      -- TCP connectivity testing
jq                  -- JSON parsing
git                 -- Repository cloning
nodejs npm          -- Node.js runtime + package manager
python3 pip         -- Python runtime + package manager
bash                -- Shell scripting

The network_mode: container:{id} setting is critical. It means the sandbox shares the application container's network namespace. When Claude runs curl http://localhost:3000/health inside the sandbox, it hits the app's port directly -- no DNS resolution, no cross-container networking, just localhost. This makes debugging network issues trivial.

The Five MCP Tools

Phase 5 added five tools to the MCP server, bringing the total to 25:

sandbox_exec_command (write risk)

The core tool. Executes any shell command inside the sandbox container and returns stdout, stderr, and the exit code.

rust// tools.rs -- sandbox_exec_command executor
"sandbox_exec_command" => {
    let app_id = resolve_app(&params, &state.db).await?;
    let command = params["command"].as_str()
        .ok_or("Missing 'command' parameter")?;

    // Validate against blocklist
    sandbox::validate_command(command)?;

    // Ensure sandbox exists (creates if needed)
    let sandbox_id = sandbox::ensure_sandbox(
        &state.docker, &app_id, &app_name
    ).await?;

    // Execute with 5-minute timeout, 100KB output limit
    let output = sandbox::exec_in_sandbox(
        &state.docker, &sandbox_id, command
    ).await?;

    serde_json::to_string(&json!({
        "stdout": output.stdout,
        "stderr": output.stderr,
        "exit_code": output.exit_code
    }))
}

sandbox_read_file (read risk)

Reads a file from the sandbox or from the app's mounted volumes. Useful for inspecting configuration files, package manifests, and log files without executing arbitrary commands.

sandbox_list_processes (read risk)

Runs ps aux inside the application container (not the sandbox) and returns the process list. This is the quickest way for Claude to see what is actually running inside the app.

sandbox_check_connectivity (read risk)

Tests network connectivity from the sandbox using nc (TCP) or curl (HTTP). Claude uses this to verify that an app can reach its database, that an external API is reachable, or that a port is actually listening.

sandbox_status (read risk)

Inspects the sandbox container itself -- is it running, how long has it been up, how much memory is it using. Useful for debugging sandbox issues.

Lifecycle Hooks

Sandboxes are managed automatically through app lifecycle hooks:

rust// handlers/apps.rs -- Post-deploy hook
async fn post_deploy(state: &AppState, app: &App) {
    if app.sandbox_enabled {
        let docker = state.docker.clone();
        let app_id = app.id.clone();
        let app_name = app.name.clone();
        tokio::spawn(async move {
            if let Err(e) = sandbox::ensure_sandbox(&docker, &app_id, &app_name).await {
                tracing::warn!("Failed to create sandbox for {}: {}", app_name, e);
            }
        });
    }
}

The lifecycle is straightforward: - Deploy: if sandbox is enabled, create or restart the sandbox container (non-blocking tokio::spawn) - Start app: ensure sandbox is running - Stop app: stop the sandbox container - Delete app: destroy the sandbox container (best-effort, does not fail the delete if sandbox cleanup fails)

The ensure_sandbox function is idempotent. If the sandbox already exists and is running, it returns immediately. If it exists but is stopped, it restarts it. If it does not exist, it creates it. This means Claude's first tool call in a session might take a few seconds to spin up the sandbox, but subsequent calls are instant.

The Dashboard Integration

The sandbox is opt-in per application. The dashboard's app settings page includes a toggle:

svelte<!-- AppSettings.svelte -->
<div class="flex items-center justify-between">
  <div>
    <h4 class="font-medium text-dark-100">{$t('settings.ai_sandbox')}</h4>
    <p class="text-sm text-dark-400">{$t('settings.ai_sandbox_description')}</p>
  </div>
  <button
    onclick={() => toggleSandbox()}
    class="relative inline-flex h-6 w-11 items-center rounded-full
      {app.sandbox_enabled ? 'bg-sh0-500' : 'bg-dark-600'}">
    <span class="inline-block h-4 w-4 transform rounded-full bg-white transition
      {app.sandbox_enabled ? 'translate-x-6' : 'translate-x-1'}" />
  </button>
</div>

The toggle sends a PATCH request to update sandbox_enabled on the app. When enabled, the next deployment automatically creates the sandbox. The AI capabilities modal (which shows users what Claude can do for each app) adds a "Sandbox" category with five tool descriptions, colour-coded in cyan.

The ProcessingSteps component was updated with icons and labels for all five sandbox tools, so users see "Executing command in sandbox..." or "Checking connectivity..." in the tool execution timeline.

All 16 new i18n keys were added across all five languages (English, French, Spanish, Portuguese, Swahili).

Why This Matters: Debugging by Doing, Not Guessing

Without a sandbox, an AI debugging session looks like this:

User: My Node app is returning 502 errors. Claude: Based on the deployment logs, the app started successfully. The 502 might be caused by: (1) the app listening on the wrong port, (2) a health check timeout, (3) a dependency issue. Could you check package.json to see if the start script is correct?

With a sandbox:

User: My Node app is returning 502 errors. Claude: Let me check. [calls get_app_details] The app is configured on port 3000. [calls sandbox_exec_command: curl -v http://localhost:3000/] The connection is refused -- nothing is listening on port 3000. [calls sandbox_exec_command: curl -v http://localhost:8080/] Port 8080 returns a 200. Your app is listening on 8080 but sh0 is routing to 3000. Update the port in your sh0.yaml or set the PORT environment variable to 3000.

The difference is the difference between a suggestion and a diagnosis. The sandbox turns Claude from a well-informed guesser into an actual debugger that can reproduce and verify problems.

Security Considerations

Giving an AI root access to a container sounds alarming. Here is why it is safe:

Isolation: The sandbox is a separate container. It shares the app's network namespace (localhost) but not its filesystem, processes, or memory. A malicious command in the sandbox cannot modify the app's files or kill its processes.

Resource limits: 1 GB RAM and 2 CPU cores. A runaway process hits the limit and gets OOM-killed. The host is unaffected.

No host access: The sandbox has no privileged capabilities, no host mounts, and no access to the Docker socket. It cannot escape the container.

Command validation: The blocklist prevents the small number of commands that could damage the container's own filesystem in ways that require host intervention (like formatting a virtual device).

Timeout: 5-minute execution limit per command. No command can run indefinitely.

Output limit: 100 KB cap prevents a command from consuming unbounded memory in the MCP response.

Scope enforcement: The sandbox_exec_command tool is classified as write risk. Users with read-scoped API keys cannot execute commands. Users with standard scope can.

The philosophical position is simple: the sandbox is less dangerous than SSH access, and we already give users SSH access through the dashboard terminal. The sandbox just makes that access available to AI clients through a structured, auditable protocol.

Build Verification

The implementation added 460 lines of Rust in the new sandbox.rs module, 230 lines in tools.rs for the five tool executors, and 85 lines in openapi.rs for the tool definitions. Dashboard changes totalled approximately 90 lines across 6 files.

cargo check -- clean
cargo test -- all tests pass (including 4 new sandbox unit tests: naming convention, allowed commands, blocked commands, shell escape handling)
npm run build (dashboard) -- clean

The total MCP tool count is now 25, verified by an assertion in the test suite.

Next in the series: From cargo build to a Live Server: The Release Pipeline -- multi-stage Docker builds, cross-compilation battles, and the first production deploy to demo.sh0.app.