The AI gateway we built in the previous article gave Claude 10 tools to query a user's server through the dashboard. It worked. But it had a fundamental limitation: the tools only existed inside our chat UI. If you wanted to use Claude Desktop, Cursor, or Claude Code to manage your sh0 server, you could not.
The Model Context Protocol changed that. MCP is an open standard -- JSON-RPC 2.0 over HTTP -- that lets any AI client discover and call tools on any server. On March 24 and 25, 2026, we built an MCP server directly into the sh0 Rust binary. By the end, we had 25 tools, a three-tier safety system, OpenAPI-driven tool generation, and a sandbox container that gives Claude root access to debug your apps.
We did this across five phases, six implementation agents, ten audit sessions, and approximately seventeen total Claude sessions orchestrated by the AI CTO. This is how.
Phase 1: The Protocol Foundation (12 Read-Only Tools)
An MCP server needs three things: a transport layer, a protocol layer, and tools. We started with the transport.
Streamable HTTP Transport
The MCP 2025-03-26 specification defines Streamable HTTP as the primary transport. The client sends JSON-RPC requests as HTTP POST to a single endpoint; the server responds with JSON-RPC results. Session state is maintained via an Mcp-Session-Id header.
// transport.rs -- Session management
struct McpSession {
user_id: String,
created_at: Instant,
tools_listed: bool,
}static MCP_SESSIONS: LazyLock
const SESSION_TTL: Duration = Duration::from_secs(3600); // 1 hour ```
The initialize handshake creates a session and returns the server's capabilities. Every subsequent request must include the session ID. Sessions expire after one hour with lazy eviction -- we clean up expired entries on each incoming request rather than running a background timer. For a single-server deployment tool, this is the right trade-off: simple, no background tasks, and the HashMap never grows unbounded.
12 Hand-Curated Tools
Phase 1 defined 12 read-only tools in tools.rs, each with a hand-written JSON Schema and a manual executor:
list_apps,get_app_details,get_app_env_vars(count only, never values)list_deployments,get_deployment_logslist_domainslist_databaseslist_cron_jobslist_backupslist_alertsget_app_logs(Docker container logs)get_server_status(CPU, memory, disk, uptime from metrics table)
Every tool that touches the database wraps its query in tokio::task::spawn_blocking because SQLite is synchronous. Every tool returns a JSON string as its result, never raw structs. And critically, environment variable values are never exposed -- only the count.
The Audit Caught Two Critical Bugs
After implementation, we ran a full security and protocol compliance audit. It found two critical issues:
1. server_metrics returned stale data. The metrics query returned rows in ORDER BY recorded_at DESC (newest first), but the code called .last() -- which grabbed the oldest row in the result set. Fix: .last() became .first().
2. No protocol version validation. The MCP spec requires the server to reject unsupported protocol versions during initialize. Our code ignored the field entirely. Fix: added a version check that returns -32602 (invalid params) on mismatch.
The audit also caught an important memory leak: sessions accumulated forever because there was no TTL. This is the fix that added the one-hour expiry with lazy eviction.
Phase 2: OpenAPI-Driven Tool Generation
Twelve hand-curated tools worked, but the pattern was fragile. Each tool required a JSON Schema definition in tools.rs and a matching executor -- two places to keep in sync. When we had 12 tools, this was manageable. At 25, it would be a maintenance burden.
The insight: sh0 already has an OpenAPI spec generated by utoipa. Every REST endpoint already has typed parameters, documented descriptions, and structured schemas. Why write tool definitions twice?
The x-mcp Extension Protocol
We defined five custom OpenAPI extensions that annotate existing handlers:
// handlers/apps.rs
#[utoipa::path(
get, path = "/api/v1/apps",
responses((status = 200, body = Vec<AppResponse>)),
extensions(
("x-mcp-enabled", json!(true)),
("x-mcp-risk", json!("read")),
("x-mcp-description", json!("List all deployed applications with status, domains, and resource usage."))
)
)]
async fn list_apps(/* ... */) -> impl IntoResponse { /* ... */ }| Extension | Purpose |
|---|---|
x-mcp-enabled | Marks this endpoint as an MCP tool |
x-mcp-risk | Risk classification: read, write, or destructive |
x-mcp-name | Override the tool name (default: operationId) |
x-mcp-description | Override the description for AI-friendly wording |
x-mcp-param-map | Remap parameter names (e.g., path param id becomes app_id) |
The Generator
The openapi.rs module iterates the OpenAPI spec at runtime:
pub fn tools_from_openapi() -> Vec<McpTool> {
let spec = ApiDoc::openapi();
let mut tools = Vec::new();for (path, item) in spec.paths.paths { for (method, operation) in item.operations { if let Some(extensions) = &operation.extensions { if extensions.get("x-mcp-enabled") == Some(&json!(true)) { if let Some(tool) = tool_from_operation(&operation, &path, &method) { tools.push(tool); } } } } }
// Add manual tools that have no REST equivalent tools.extend(manual_tool_definitions()); tools } ```
The manual_tool_definitions() function handles tools that do not map to REST endpoints -- like get_app_logs, which calls the Docker API directly rather than going through an HTTP handler.
We chose a hybrid approach: definitions are auto-generated from OpenAPI, but execution remains manual in tools.rs. Adding a new MCP tool requires two steps: (1) add extensions(...) to the handler's utoipa annotation, and (2) add a match arm in the executor. This is simpler than full auto-routing and avoids the complexity of internal HTTP dispatch.
Four unit tests verify that the generator produces the correct number of tools, that schemas match the hand-written originals, and that parameter remapping works correctly.
Phase 3: Write Operations and the Three-Tier Safety System
Read-only tools are safe by definition. Write tools are not. Phase 3 added seven new tools that can modify server state, and a safety architecture to prevent catastrophic mistakes.
The Seven New Tools
Write tools (5):
- restart_app -- restart a running container
- deploy_app -- trigger a new deployment
- scale_app -- change replica count
- trigger_backup -- run a backup immediately
- trigger_cron -- execute a cron job now
Destructive tools (2):
- delete_app -- permanently remove an application
- delete_database -- permanently remove a database
Scoped API Keys
We added a scope field to the AuthUser struct, populated from the API key's existing scopes column:
// extractors.rs
pub struct AuthUser {
pub user_id: String,
pub scope: Option<String>, // None = unrestricted (JWT), Some("read"|"standard"|"admin")
}Three scope levels control what an MCP client can do:
| Scope | Allowed Tools |
|---|---|
read | Read-only tools (list_apps, get_server_status, etc.) |
standard | Read + write tools (restart_app, deploy_app, scale_app, etc.) |
admin | All tools, including destructive (delete_app, delete_database) |
JWT-authenticated users (dashboard sessions) get unrestricted access. API keys default to admin for backward compatibility, but users can create scoped keys for MCP clients they do not fully trust.
Confirmation Tokens for Destructive Operations
Even with admin scope, destructive tools do not execute immediately. Instead, they return a confirmation token:
// transport.rs -- Confirmation flow
struct PendingConfirmation {
user_id: String,
tool_name: String,
params: serde_json::Value,
created_at: Instant,
}static PENDING_CONFIRMATIONS: LazyLock
const CONFIRMATION_TTL: Duration = Duration::from_secs(300); // 5 minutes ```
When Claude calls delete_app, the server returns:
{
"confirmation_required": true,
"confirmation_token": "cf_a1b2c3d4...",
"message": "This will permanently delete app 'my-api' and all its data. Call confirm_action with this token to proceed.",
"expires_in_seconds": 300
}The AI client must then call confirm_action with the token to proceed. The token is single-use, user-scoped (prevents cross-user confirmation), and expires after 5 minutes with lazy eviction. This creates a human-in-the-loop checkpoint: the user sees Claude's intent to delete and can approve or reject it.
Audit Logging
Every write and destructive MCP tool call is logged via the existing audit system:
audit::record(
&state.db,
&user.user_id,
&format!("mcp:{}", tool_name),
Some("app"),
Some(&app_id),
Some(&app_name),
).await;This means the admin dashboard shows a complete history of what AI clients have done to the server -- which app was restarted, which deployment was triggered, and who authorised it.
Phase 4: Gateway MCP Connector (3-Way Routing)
With the MCP server running on the sh0 binary and the AI gateway running on sh0.dev, we needed to connect them. Phase 4 added an MCP connector to the gateway that enables 3-way routing:
1. MCP tools -- if the user's sh0 server has an MCP endpoint, route tool calls through it.
2. Legacy tools -- if no MCP endpoint is available, fall back to the original 10 gateway tools (client-side execution).
3. Documentation tools -- tools like generate_config_file that do not need server access are handled directly by the gateway.
The connector auto-discovers the MCP endpoint by probing https://{panel_domain}/api/v1/mcp during the first tool call. If it responds to an initialize request, the gateway maintains an MCP session and routes all subsequent tool calls through it. If not, the gateway falls back to the legacy SSE-based tool calling flow.
The CTO Orchestration: 17 Sessions
The MCP rollout was the most complex orchestration the AI CTO had performed. The numbers tell the story:
- 6 implementation agents -- one for Phase 1 (the CTO itself), one each for Phases 2-5, plus one for a bug fix
- 10 auditor agents -- two independent audit rounds per phase
- 1 research agent -- evaluated the
rmcpRust SDK (rejected due to Axum 0.8 dependency conflict) - 3 critical bugs caught across all audits
- 11 important issues caught -- including the session memory leak, the O(n) app lookup, and a protocol version validation gap
The CTO session designed the architecture, wrote the Phase 1 implementation directly, drafted implementation prompts for the four remaining phases, drafted all audit prompts, reviewed every audit result, and made architectural decisions when agents diverged (rejecting rmcp, approving root access for sandbox containers, deferring MCP Prompts).
This is the operational reality of an AI CTO managing a multi-agent engineering team. Each agent sees only its phase. The CTO sees the whole system.
The Final Tool Count
After five phases, sh0's MCP server exposes 25 tools:
| Category | Tools | Count |
|---|---|---|
| Apps | list_apps, get_app_details, get_app_env_vars, get_app_logs, restart_app, deploy_app, scale_app, delete_app | 8 |
| Deployments | list_deployments, get_deployment_logs | 2 |
| Domains | list_domains | 1 |
| Databases | list_databases, delete_database | 2 |
| Cron | list_cron_jobs, trigger_cron | 2 |
| Backups | list_backups, trigger_backup | 2 |
| Monitoring | list_alerts, get_server_status | 2 |
| Safety | confirm_action | 1 |
| Sandbox | sandbox_exec_command, sandbox_read_file, sandbox_list_processes, sandbox_check_connectivity, sandbox_status | 5 |
Every tool has a risk classification. Every write tool is audit-logged. Every destructive tool requires confirmation. Every tool definition is generated from the OpenAPI spec (except the 6 manual tools that have no REST equivalent).
The test suite grew from 442 to 488 tests across all phases, with zero failures on the final build. cargo check is clean. The dashboard builds at 43 KB for the AI page.
What We Learned
Building an MCP server in a compiled Rust binary taught us four things:
1. OpenAPI-driven generation eliminates drift. When tool schemas are derived from the same annotations that generate the API docs, they cannot diverge. One source of truth, two consumers.
2. Three-tier safety is the minimum for AI write operations. Scoped keys, risk classification, and confirmation tokens are not paranoia -- they are the bare minimum to ship AI-powered infrastructure management without a negligence lawsuit.
3. Audits catch what implementation misses. The two-round audit process (fresh agent each time) found issues the implementation agent could not see because it was too close to the code. The stale-metrics bug and the session memory leak were invisible from inside the implementation.
4. CTO orchestration scales. Seventeen sessions across two days, coordinated by a single AI CTO session, produced 25 tools with zero regressions. The pattern -- implement, audit, fix, audit again -- works at scale.
---
Next in the series: AI Sandbox: Giving Claude a Safe Container to Debug Your Apps -- how we gave Claude root access to an Alpine container so it can actually debug your deployments instead of guessing.