Designing a Safety Layer for AI-Driven Server Management

When you build an MCP server that lets AI agents manage production infrastructure, the question isn't can the agent restart your app -- it's should it, and who said so.

sh0's MCP server started with 12 read-only tools. An AI agent could list apps, check server status, read logs. Useful, but limited. Phase 3 adds write operations: restart, deploy, scale, backup, trigger cron jobs, and yes -- delete apps and databases.

The challenge: how do you give AI agents real power without creating a footgun?

The Three-Layer Safety Model

Layer 1: Scoped API Keys

Every MCP connection authenticates with an API key. Each key now carries a scope: read, standard, or admin.

The scope determines what risk level of tools the key can access:

Scope	Read tools	Write tools	Destructive tools
`read`	Yes	No	No
`standard`	Yes	Yes	No
`admin`	Yes	Yes	Yes (with confirmation)

The implementation is deliberately simple. No complex RBAC matrices, no per-tool permission lists. Three levels, three risk categories, a straightforward matrix. This is important because the people configuring these keys need to understand the security model in 30 seconds.

For backward compatibility, existing API keys default to admin scope. JWT-authenticated users (the dashboard) bypass scope enforcement entirely -- they already have full access.

Layer 2: Risk Classification

Every MCP tool has a risk level:

Read: list_apps, get_server_status, server_metrics -- cannot modify state
Write: restart_app, deploy_app, scale_app -- modifies state but is recoverable
Destructive: delete_app, delete_database -- permanent, irreversible

The risk level is declared in the OpenAPI spec via x-mcp-risk extensions, the same mechanism Phase 2 introduced for auto-generating tool definitions. But for enforcement, we use a simple tool_risk() match function in the transport layer. The spec is the source of truth for documentation; the match statement is the source of truth for enforcement. Deliberately decoupled -- you cannot accidentally change enforcement by editing a doc comment.

Layer 3: Confirmation Tokens

Even with admin scope, destructive operations don't execute immediately. Instead:

The AI agent calls delete_app with app_id: "myapp"
The MCP server returns a message:
> This action will DELETE application 'myapp' and its container. This cannot be undone. To confirm, call confirm_action with token: abc-123-def
The agent must call confirm_action with that token
Only then does the deletion proceed

The token is: - Single-use: removed after confirmation - Time-limited: 5-minute TTL with lazy eviction - User-bound: the confirming user must match the requesting user - In-memory: no persistence, no cleanup cron -- just a HashMap with TTL checks

This gives the AI (or the human supervising it) a deliberate pause. The agent sees a clear description of the consequences and must make an explicit second call. For agentic workflows where a human reviews the agent's actions, this creates a visible checkpoint.

Why Not Just Use Middleware?

The alternative was an Axum middleware that intercepts all MCP requests and checks permissions. We rejected this for three reasons:

MCP tools don't map 1:1 to REST endpoints. The confirm_action tool has no REST equivalent. The get_app_logs tool calls Docker directly, not a REST handler.

The confirmation flow is MCP-specific. REST clients use the dashboard UI for confirmation. MCP clients need a protocol-level mechanism.

Scope semantics differ between REST and MCP. A REST API key might have fine-grained endpoint permissions. MCP scopes are coarser by design -- AI agents need simple, predictable access models.

The Audit Trail

Every write and destructive MCP operation logs to the audit system with an mcp: prefix:

mcp:restart_app    app     app-123    myapp
mcp:deploy_app     deploy  dep-456    myapp
mcp:delete_app     app     app-123    myapp

This makes it trivial to answer "what did the AI agent do?" after the fact. Filter audit logs by mcp:* and you have a complete history of AI-initiated mutations.

What We Learned

Simple beats configurable. Three scopes, three risk levels, one matrix. Anyone can understand it. The temptation was to build per-tool permissions, role hierarchies, time-based access windows. All of that complexity would serve edge cases while making the common case harder to reason about.

Confirmation tokens are surprisingly effective. They solve two problems: preventing accidental destruction, and creating an audit-visible decision point. They also work naturally with how AI agents operate -- the agent sees the warning message in its context and can decide whether to proceed.

The existing scopes column saved a migration. The API key table already had a free-form scopes field. We repurposed it with defined semantics (read, standard, admin) and a backward-compatible default. No schema change, no migration, no downtime risk.

Next Steps

Phase 3 is complete and heading into two rounds of independent audits. The auditors will review: - Scope enforcement correctness (no bypass paths) - Confirmation token security (no replay, no cross-user, no timing attacks) - Write executor safety (no unwrap, proper error handling, correct Docker calls) - Backward compatibility (existing read tools unchanged, existing keys keep working)

After audits, we'll test with Claude Desktop as an MCP client to validate the end-to-end experience.

Designing a Safety Layer for AI-Driven Server Management

The Three-Layer Safety Model

Layer 1: Scoped API Keys

Layer 2: Risk Classification

Layer 3: Confirmation Tokens

Why Not Just Use Middleware?

The Audit Trail

What We Learned

Next Steps

Responses

Related Articles

Heroku Alternative for Node.js in 2026: Why Developers Are Self-Hosting

How to Deploy PostgREST on a VPS in 10 Minutes

Best cPanel Alternative for Node.js, Rust, Python, and Go Deployments