#146 -- Auditing 186,000 Lines of Code

There comes a point in every software project where you stop adding features and start asking hard questions. Is this codebase sound? Are there hidden bugs waiting to detonate in production? How many TODOs did we leave behind during the sprint of building a programming language in 42 days? For FLIN, that reckoning came on January 29, 2026 -- the day we decided to read every single line.

One hundred eighty-six thousand, two hundred fifty-two lines of Rust. One hundred and five files. Ninety-three sessions of accumulated design decisions, midnight fixes, architectural pivots, and incremental refinements. The audit was not a sampling exercise or a static analysis pass. It was Claude, the AI CTO, reading the codebase line by line, file by file, module by module, from the lexer's first token definition to the AI gateway's last HTTP call.

This is the story of how we audited an entire programming language implementation -- and what we found.

Why a Manual Audit

Static analysis tools exist. Clippy catches common Rust mistakes. cargo test validates behavior. But none of these tools can answer the questions that matter most for a language runtime: Are there duplicate opcode handlers that silently produce different results? Do all TODOs represent deferred work or forgotten landmines? Are there production panic calls hiding in code paths that users will actually hit?

FLIN was built in an unprecedented fashion -- 301 sessions over 42 days, with a human CEO and an AI CTO pair-programming from Abidjan. The speed was extraordinary, but speed leaves artifacts. Each session solved a problem, but the cumulative effect of 301 sessions is a codebase that no single entity has reviewed end to end. Until now.

The decision was straightforward. Before FLIN could enter beta, someone had to read it all. Claude had written most of the code across those sessions, but context windows reset between conversations. The audit would be the first time the entire codebase was held in a single analytical pass.

The Architecture Under Review

FLIN's compilation pipeline follows a classical structure with a modern twist. Source code flows through six stages before execution:

Source (.flin)
    |
Scanner (5,877 lines) --> Tokens
    |
Parser (21,735 lines) --> AST
    |
Resolver (1,858 lines) --> Resolved AST
    |
Typechecker (9,925 lines) --> Typed AST
    |
Codegen (11,936 lines) --> Bytecode
    |
VM (61,054 lines) --> Execution

The VM alone accounts for a third of the codebase. Within it, vm.rs weighs in at 27,257 lines -- the single largest file and the most critical one to audit. It contains every opcode handler, every native function binding, and the core execution loop that runs every FLIN program.

Supporting the core pipeline are the server module (17,908 lines for HTTP, WebSocket, and OAuth), the database module (28,395 lines for ZEROCORE, FLIN's embedded database), the storage module (7,866 lines for file backends including local, S3, and GCS), and the AI module (2,208 lines for embeddings and LLM integration).

The Audit Methodology

The audit proceeded in a strict order, prioritizing files that had previously exhibited bugs:

rust// Tier 1: Previously buggy files (VERIFY FIXES)
// vm/vm.rs         -- 27,257 lines -- duplicate opcode handlers
// vm/renderer.rs   --  7,540 lines -- click handler bugs
// database/zerocore.rs -- 15,098 lines -- persistence failures
// codegen/emitter.rs   --  8,837 lines -- missing codegen
// typechecker/checker.rs -- 7,715 lines -- type inference gaps

// Tier 2: Critical path (VERIFY CORRECTNESS)
// parser/parser.rs  -- 16,544 lines -- all syntax forms
// lexer/scanner.rs  --  4,248 lines -- all tokens
// server/http.rs    --  3,870 lines -- all HTTP routes

// Tier 3: Supporting files (SPOT CHECK)
// database/*  -- CRUD, transactions
// server/*    -- WebSocket, OAuth, guards
// storage/*   -- file backends
// ai/*        -- embeddings, providers

For each file, the audit captured five categories of findings: TODOs and FIXMEs, panic calls in production code paths, dead code, duplicate implementations, and security vulnerabilities. Every finding was logged with its exact file path, line number, severity level, and a proposed resolution.

The Dashboard

After reading all 186,252 lines, the audit produced a summary that was both reassuring and alarming in equal measure:

Category              Count    Critical  High  Medium  Low
TODOs                   30         2       4      11    13
Duplicate Code           1         1       -       -     -
Panic Calls (Prod)       5         -       2       3     -
Panic Calls (Test)    ~120         -       -       -     -
Dead Code                1         -       -       1     -
Security Issues          0         -       -       -     -
Unimplemented            0         -       -       -     -

Zero security issues across 186,000 lines of a web-facing language runtime. Zero unimplemented features that were expected to work. Thirty TODOs, most of which were low-priority polishing items rather than missing functionality. Five production panic calls that could theoretically crash the runtime under specific conditions.

And one critical duplicate -- a duplicate opcode handler that was silently producing different results depending on which code path executed it.

Module-by-Module Findings

The cleanest modules were the lexer and the parser AST. The lexer's 5,877 lines contained zero TODOs, zero panics, and zero dead code. Its three-mode state machine (Code, Tag, Content) was precisely implemented with proper Result<T, Vec<LexError>> error handling throughout. The parser AST, at 5,162 lines, was equally pristine -- comprehensive Display implementations, Span tracking on all nodes, and clean data structures with no panics.

The parser logic at 16,544 lines was a pleasant surprise. All 600+ panic! calls were confined to test code (line 8866 and beyond). The production parsing code used proper Result<T, ParseError> error handling throughout -- a testament to disciplined Rust programming even under the pressure of rapid development.

rust// Parser: all errors are proper Results, never panics
fn parse_expression(&mut self, min_bp: u8) -> Result<Expr, ParseError> {
    let mut left = self.parse_prefix()?;
    while let Some(op) = self.peek_infix_op() {
        let (l_bp, r_bp) = infix_binding_power(&op);
        if l_bp < min_bp {
            break;
        }
        self.advance();
        let right = self.parse_expression(r_bp)?;
        left = Expr::Binary {
            left: Box::new(left),
            op,
            right: Box::new(right),
            span: self.current_span(),
        };
    }
    Ok(left)
}

The VM was where the trouble lived. At 27,257 lines, it contained 48 panic calls in production code paths -- type assertions that should have been converted to Result returns. More critically, it contained the duplicate CreateMap opcode handler that would become the subject of the next article in this series. The database module at 28,395 lines carried the most TODOs, reflecting the complexity of implementing a full embedded database with WAL, transactions, time-travel, and multiple storage backends.

What the Numbers Mean

One hundred eighty-six thousand lines is a substantial codebase for any project, let alone one built in 42 days. For context, the Lua interpreter is approximately 30,000 lines of C. SQLite is around 150,000 lines. FLIN, which includes a language runtime, a web server, an embedded database, a template engine, an AI integration layer, and a full type system, sits at 186,252 lines -- roughly the complexity of SQLite but spanning a much broader feature surface.

The audit confirmed that the codebase was fundamentally sound. The critical issues were concentrated in a few files -- primarily vm.rs and renderer.rs -- and the fixes were surgical rather than architectural. No module needed to be rewritten. No design decision needed to be reversed. The 301 sessions of incremental development had produced a coherent, working system with a small number of specific defects that the audit could enumerate and the subsequent fix sessions could eliminate.

rust// The audit's final tally
struct AuditSummary {
    sessions_completed: u32,    // 93 of 93
    files_audited: u32,         // 105 of 105
    lines_read: u32,            // 186,252 of 186,252
    todos_found: u32,           // 30
    critical_bugs: u32,         // 2
    security_issues: u32,       // 0
    test_count: u32,            // 3,452
    tests_passing: u32,         // 3,452
}

The Decision to Fix Everything

Some teams would look at 30 TODOs and 5 production panics and declare the project beta-ready. We made a different choice. Every TODO would be resolved. Every production panic would be eliminated or converted to proper error handling. The duplicate opcode would be unified. The missing codegen would be implemented. The WebSocket gaps would be filled.

This was not perfectionism. This was the recognition that FLIN is a language runtime -- a foundation on which other people will build their applications. A TODO in a web application is a minor debt. A TODO in a language runtime is a trap waiting for the developer who happens to exercise that code path. We owed it to FLIN's future users to close every open item before they encountered it.

The audit fix plan organized all 30 TODOs and 6 pre-audit bugs into five phases: Critical (week 1-2), High (week 3-4), Medium (week 5-6), Low (week 7-8), and Exhaustive Re-audit (week 9-12). In practice, the fixes would be completed far faster than planned -- all 21 fix items resolved in just five sessions across two days.

But that acceleration is a story for the articles to come.

This is Part 146 of the "How We Built FLIN" series, documenting how a CEO in Abidjan and an AI CTO designed and built a programming language from scratch.

Series Navigation: - [145] Previous article - [146] Auditing 186,000 Lines of Code (you are here) - [147] The Duplicate Opcode That Almost Broke Everything

#146 -- Auditing 186,000 Lines of Code

Why a Manual Audit

The Architecture Under Review

The Audit Methodology

The Dashboard

Module-by-Module Findings

What the Numbers Mean

The Decision to Fix Everything

Responses

Related Articles

Step Zero Wasn’t Enough: How Validating A Constructor But Not The Runtime Took Down Every Déblo Voice Session The Hour We Shipped Real-Time Camera Streaming

The Em-Dash That Killed Production: How One Marketing Tagline In An HTTP Header Took Down Déblo’s Chat For 24 Hours

Six Hours From Empty Page to Apple Review — How We Submitted Déblo to the App Store, Live