We Audited Our Own Platform and Found 88 Security Issues

On March 12, 2026 -- twelve days into building sh0.dev -- we stopped writing features and audited everything we had built so far. Not a cursory review. A systematic, phase-by-phase security audit across the entire codebase: proxy manager, deploy pipeline, auth module, monitoring, backup engine, dashboard, compose management, RBAC, preview environments, deploy hooks, infrastructure-as-code, horizontal scaling, and uptime monitoring.

We found 88 issues. Nine were critical. Twelve were high severity. Forty-five were medium. Twenty-two were low.

This is not an article about how secure sh0 is. It is an article about what we found, how we fixed it, and why auditing your own code -- brutally, systematically, before anyone else does -- is one of the highest-leverage activities in software engineering.

The Four Audit Rounds

We divided the audit into four rounds, each covering a set of implementation phases:

Round	Phases Covered	Scope	Findings
1	Phases 1-6	Core infrastructure (Docker, Git, database, containers)	Covered in earlier sessions
2	Phases 7-12	Proxy, Deploy Pipeline, Auth, Monitor, Backup, Dashboard	88 findings
3	Phases 13-19	Alerts, RBAC, Templates, Compose, i18n	45 findings
4	Phases 20-25	Compose V2, Preview Envs, Deploy Hooks, IaC, Scaling, Uptime	51 findings

Round 2 produced the 88 findings that give this article its title. Rounds 3 and 4 added another 96 findings across later phases. Every critical and high finding across all rounds was fixed before we moved on. The approach was the same each time: enumerate findings by severity, parallelize the fixes across independent file sets, run the full test suite, and verify clean compilation.

The 88 Findings: Breakdown by Phase

Here is where the Round 2 findings landed:

Phase	Component	Critical	High	Medium	Low	Total
7	Proxy Manager	3	4	7	4	18
8	Deploy Pipeline	1	5	14	6	26
9	Auth Module	3	1	11	5	20
10	Monitor	0	0	1	2	3
11	Backup Engine	1	2	4	1	8
12	Dashboard	0	0	5	3	8
--	Integration	1	0	3	1	5
Total		9	12	45	22	88

Phase 8 (Deploy Pipeline) had the most findings -- 26. This makes sense: deployment pipelines touch user input, shell commands, file systems, network requests, and container orchestration. Every surface is an attack surface.

The Nine Critical Findings

1. Command Injection in Database Backup (Phase 11)

The backup engine interpolated the db_name parameter directly into shell commands passed to pg_dump, mysqldump, and mongodump. A database named test; rm -rf / would execute arbitrary commands inside the container.

rust// BEFORE: vulnerable to injection
let cmd = format!("pg_dump -U postgres {}", db_name);
Command::new("sh").arg("-c").arg(&cmd).output()?;

// AFTER: strict validation before any command construction
fn validate_db_name(name: &str) -> Result<(), BackupError> {
    if name.is_empty() || name.len() > 128 {
        return Err(BackupError::InvalidInput("Invalid database name length".into()));
    }
    if !name.chars().all(|c| c.is_alphanumeric() || c == '_' || c == '-') {
        return Err(BackupError::InvalidInput("Database name contains invalid characters".into()));
    }
    Ok(())
}

This was the first fix we implemented. Command injection in a PaaS is the worst possible vulnerability -- it gives attackers arbitrary code execution on the host.

2. WebSocket Handler Missing Authentication (Integration)

The stream_logs WebSocket handler accepted connections without extracting or verifying AuthUser. Any request -- authenticated or not -- could stream any application's logs. Logs often contain environment variables, database queries, and error messages with sensitive context.

The fix: extract and verify the JWT before upgrading the HTTP connection to WebSocket. Invalid or missing tokens receive a 401 before the upgrade handshake completes.

3. Timing Attack on API Key Comparison (Phase 9)

We used == to compare API key hashes. As discussed in Article 9, this leaks information through response timing. The fix: subtle::ConstantTimeEq for all hash comparisons.

4-5. No Rate Limiting on Login/TOTP + Backup Codes Not Stored (Phase 9)

Unlimited login attempts make brute-force trivial. Unlimited TOTP attempts make the 6-digit code space (1,000,000 possibilities) crackable in minutes. We built an in-memory sliding-window rate limiter: 5 login attempts per 15 minutes, 5 TOTP attempts per 5 minutes.

The backup codes issue was equally critical: the TOTP setup endpoint generated 10 backup codes and returned them to the user, but never stored them in the database. If a user lost their authenticator, the codes they wrote down were useless. We added a backup_codes_hash column and Argon2id hashing for each backup code, with single-use consumption on login.

6-8. SSRF via Unvalidated URLs (Phase 7)

Three related findings in the proxy manager. The Caddy admin URL, upstream addresses, and domain configurations all accepted arbitrary input without validation. An attacker could point the proxy to http://169.254.169.254 (cloud instance metadata endpoint) or internal services.

rust// Validate admin URL: must be localhost only
fn validate_admin_url(url: &str) -> Result<(), ProxyError> {
    let parsed = Url::parse(url).map_err(|_| ProxyError::InvalidUrl)?;
    match parsed.host_str() {
        Some("localhost") | Some("127.0.0.1") | Some("::1") => Ok(()),
        _ => Err(ProxyError::InvalidUrl),
    }
}

// Validate upstream: must be private IP range
fn is_private_ip(ip: &IpAddr) -> bool {
    match ip {
        IpAddr::V4(v4) => v4.is_private() || v4.is_loopback(),
        IpAddr::V6(v6) => v6.is_loopback(),
    }
}

9. No Concurrent Deploy Locking (Phase 8)

Two simultaneous deploys to the same application could cause port conflicts, container name collisions, and corrupted state. We added a per-app deploy lock using DashMap<String, Arc<Mutex<()>>> -- a concurrent hash map where each app ID maps to its own mutex. The lock is held for the duration of the deploy and released automatically via RAII.

The Twelve High-Severity Findings

High-severity findings would not grant immediate code execution but could lead to data exposure, service disruption, or privilege escalation.

Path traversal in backup storage -- The local backup storage resolved file paths with self.base_dir.join(key) without canonicalization. A key like ../../etc/passwd could escape the backup directory. Fix: canonicalize the resolved path and verify it starts with base_dir.

Build logs exposing secrets -- Docker build output was streamed and stored without redacting environment variables. A build log for an application with DATABASE_URL set would contain the full database connection string, accessible to anyone with deploy permissions. We built a regex-based redaction filter:

rustfn redact_secrets(line: &str) -> String {
    let re = regex::Regex::new(
        r"(?i)([\w]*(?:KEY|SECRET|PASSWORD|TOKEN|CREDENTIAL|AUTH)[\w]*\s*=\s*)\S+"
    ).unwrap();
    re.replace_all(line, "${1}***REDACTED***").to_string()
}

User enumeration via timing -- Login attempts for non-existent users returned faster than attempts for real users with wrong passwords (because Argon2id hashing was skipped). Fix: always run a dummy hash comparison.

No webhook payload size limit -- GitHub and GitLab webhook endpoints accepted unbounded POST bodies. A crafted payload could exhaust server memory. Fix: DefaultBodyLimit::max(1_048_576) (1 MB) on webhook routes.

Unwrap calls in handlers -- Approximately 25 .unwrap() calls on serde_json::to_value() across 7 handler files. Each one was a potential panic in production. We replaced them all with a to_json() helper that returns ApiError::Internal on serialization failure.

The remaining high findings included: unsanitized Docker build args, hardcoded database credentials in the backup dump module, unlimited git clone depth, no ACME email validation, no body size limits on the general API, and unvalidated Caddy configurations.

Rounds 3 and 4: More Phases, More Findings

Round 3: Phases 13-19 (45 findings, 27 fixed)

The most significant findings in this round:

SSRF in alert webhooks: webhook dispatch URLs could target private IP ranges. We added the same private-IP rejection used in the proxy module.
HTML injection in alert emails: user-controlled fields (app name, alert description) were interpolated into HTML email bodies without escaping.
SMTP header injection: newlines in email subject fields could inject additional headers.
Volume mount path traversal: Docker Compose configurations could mount host paths, potentially exposing the host filesystem.
YAML bomb protection: no size limit on Compose YAML files. A 10 MB YAML with deeply nested anchors could consume gigabytes of memory during parsing. Fix: 256 KB hard limit.
Missing RBAC enforcement: several endpoints checked authentication but not authorization -- a viewer could perform developer-level actions.

Round 4: Phases 20-25 (51 findings, 37 fixed)

The critical findings here centered on two themes:

Command injection in cron and deploy hooks -- Cron job definitions and deploy hook commands accepted arbitrary shell input. We built validate_command():

rustconst FORBIDDEN_CHARS: &[char] = &[';', '|', '&', '`', '>', '<', '\n', '\r'];
const FORBIDDEN_PATTERNS: &[&str] = &["$(", "${"];

pub fn validate_command(cmd: &str) -> Result<(), ApiError> {
    if cmd.len() > 4096 {
        return Err(ApiError::BadRequest("Command too long".into()));
    }
    for ch in FORBIDDEN_CHARS {
        if cmd.contains(*ch) {
            return Err(ApiError::BadRequest(
                format!("Command contains forbidden character: {}", ch)
            ));
        }
    }
    for pattern in FORBIDDEN_PATTERNS {
        if cmd.contains(pattern) {
            return Err(ApiError::BadRequest(
                format!("Command contains forbidden pattern: {}", pattern)
            ));
        }
    }
    Ok(())
}

SSRF in uptime monitoring -- Uptime check URLs could target private IP addresses, turning the monitoring system into an SSRF proxy. We implemented comprehensive private IP rejection covering RFC 1918 ranges, link-local addresses, CGNAT (100.64.0.0/10), and loopback.

The Fix Process: Parallel Teams, Zero File Overlap

With 88 findings to fix in Round 2 alone, sequential remediation was not an option. We organized fixes into four parallel teams, each responsible for a set of crates with no overlapping files:

Team A: sh0-proxy (4 fixes, 15 new tests)
Team B: sh0-auth + sh0-db (3 fixes, 9 new tests)
Team C: sh0-backup (5 fixes, 13 new tests)
Team D: sh0-api + sh0-docker + sh0-git (13 fixes, ~15 new tests)

The key constraint: zero file overlap between teams. This eliminated merge conflicts and allowed all four streams to execute simultaneously. After all teams completed, a single integration pass verified that the combined changes compiled and all 206 tests (172 existing + 34 new) passed.

The same pattern applied to Rounds 3 and 4: group fixes by crate boundary, parallelize, verify.

Patterns We Saw Repeatedly

Across all four audit rounds, certain vulnerability categories appeared again and again:

1. Input validation at the boundary. Every place where user input enters the system -- HTTP request bodies, query parameters, webhook payloads, YAML files, cron expressions, command strings -- needs validation. The further input travels without validation, the harder the fix.

2. SSRF wherever URLs are accepted. If your system makes HTTP requests to user-provided URLs -- webhook dispatch, uptime monitoring, proxy configuration -- you need private IP filtering. Cloud metadata endpoints at 169.254.169.254 are the most common target, but internal services on 10.x.x.x and 172.16.x.x are equally dangerous.

3. Timing side channels. Any comparison of secrets -- API keys, passwords, TOTP codes -- must be constant-time. Standard string comparison leaks information through response timing.

4. Missing authorization after authentication. Checking that a user is logged in is not the same as checking that they have permission to perform an action. Several endpoints verified authentication (the user has a valid token) but not authorization (the user has the right role for this operation on this resource).

5. Panic-inducing unwraps. Every .unwrap() in a request handler is a potential denial-of-service vector. If the unwrap triggers on a malformed input, the handler panics, the Tokio task terminates, and the client gets an opaque 500 error. Replace every .unwrap() in request-handling code with proper error propagation.

The Numbers After Remediation

Round	Total Found	Critical Fixed	High Fixed	Medium Fixed	Tests Added
2	88	9/9	12/12	4/45	34
3	45	All CRITICAL	All HIGH	Most MEDIUM	Updated existing
4	51	7/7	12/12	12/18	10

After all three remediation sessions: - cargo test: 312 tests passing - cargo clippy -- -D warnings: zero warnings - cargo build --release: clean compilation - Dashboard build: clean

Why Audit Yourself First

The standard advice is to hire a third-party penetration tester. That is good advice -- you should do it. But a third-party audit on code you have never reviewed yourself is a waste of money. They will spend half their time finding issues you could have found with a careful read-through, and you will pay their hourly rate for it.

Audit yourself first. Be systematic. Go phase by phase, file by file. Write down every finding with its severity, location, and proposed fix. Then fix the critical and high findings. Then bring in the external auditor, who can now focus on the subtle issues: business logic flaws, race conditions, cryptographic misuse patterns -- the things that require deep expertise to find.

We found 88 issues in Round 2 alone. If an external auditor had found those, it would have been an expensive engagement. Instead, we found them ourselves, fixed them in a single session, and added 34 regression tests to make sure they never come back.

What Remains

Not every finding was fixed immediately. The 45 medium and 22 low findings from Round 2 include items like:

JWT expiry reduction with refresh tokens (later implemented in the cookie migration)
Password complexity requirements
Account lockout after failed attempts
API key expiry and scoping
Build timeout enforcement
Deploy preview environments
Audit logging for all security events

These are real improvements, not theoretical. They are prioritized in the backlog, and we are working through them. But the critical and high findings -- the ones that could lead to code execution, data exposure, or privilege escalation -- are all fixed. That is the point of severity classification: fix what matters most, first.

Key Takeaways

Audit in rounds, not all at once. Breaking the audit into phases makes the task manageable and ensures coverage. You will not find everything in a single pass.
Parallelize fixes by file boundary. Group findings by crate or module, assign non-overlapping file sets, and execute simultaneously. This turned a multi-day remediation into a single session.
Every .unwrap() in handler code is a bug. Not eventually. Not theoretically. It is a denial-of-service vector today.
SSRF is everywhere. Any feature that makes HTTP requests to user-provided URLs needs private IP filtering. This includes webhooks, monitoring, proxy configuration, and health checks.
The number does not matter. The severity does. 88 findings sounds alarming. But the 9 critical findings are what mattered. Fix those, and the platform goes from "exploitable" to "hardened." Fix the remaining 79 at a measured pace.

Next in the series: Migrating from localStorage Tokens to HTTP-Only Cookies -- how we replaced the most common authentication anti-pattern in single-page applications.