FLIN was built in 301 sessions. The audit covered 93 of those sessions' worth of code in a single analytical pass. Not 93 sessions of auditing -- 93 sessions of development, reviewed in one concentrated effort. The distinction matters. Building a feature in a session takes hours of exploration, experimentation, debugging, and refinement. Auditing that session's output takes minutes, because you are not building -- you are verifying. The creative friction is gone. What remains is systematic reading.
But systematic reading of code that was written across 93 sessions, by an AI CTO whose context resets between conversations, presents its own challenges. Each session had its own goals, its own constraints, and its own understanding of the codebase. Changes made in Session 200 might contradict assumptions from Session 50. A function added in Session 130 might be superseded by a different approach in Session 180 but never removed. The audit was not just a code review -- it was an archaeological expedition through 42 days of incremental construction.
The Session Trail
Every session left traces in the code. Comment annotations marked which session introduced which feature. The audit used these annotations to reconstruct the development timeline and verify that later sessions did not break earlier invariants.
rust// Traces from FLIN's session history, found during the audit:
// Session 103: Modules (Import, From, As, Export)
// token.rs lines 325-334
Import, From, As, Export,
// Session 112: String interpolation
// scanner.rs lines 1038-1119
// Session 115: Switch, Watch, Context, Impl
// token.rs lines 157-159, 299-301, 315-316
// Session 118: Lazy, Hash, Fragment <>
// token.rs lines 305-306, 828-830
// Session 119: DocComment token
// token.rs lines 924-925, scanner.rs lines 676-706
// Session 133: Trait keyword
// token.rs lines 317-318
// Session 147: Static, Generic type handling
// token.rs lines 319-320, scanner.rs lines 576-582
// Session 150: Pipeline operator |>
// token.rs lines 807-808, scanner.rs lines 632-634
// Session 155: Loop labels
// token.rs lines 909-911, scanner.rs lines 656-658
// Session 195: WebSocket keywords
// token.rs lines 173-184, 386-390, 517-522
// Session 240: About keyword for RAG
// token.rs lines 201-202
// Session 252: URL pass-through (HTML compliance)
// scanner.rs lines 4056-4246The session annotations served as a breadcrumb trail. When the audit found a suspicious pattern, it could trace the session that introduced it, check the session logs for context, and determine whether the pattern was intentional or accidental.
The Audit's Coverage Model
Ninety-three audit sessions does not mean 93 rounds of file reading. It means the code produced across 93 development sessions was read in a structured order that prioritized risk over chronology. The audit did not review Session 1's code first and Session 93's code last. Instead, it reviewed the riskiest files first, regardless of when they were written.
The coverage model had three tiers:
Tier 1: Line-by-line reading (5 files, 66,889 lines)
vm/vm.rs 27,257 lines -- Every line, every opcode
parser/parser.rs 16,544 lines -- Every parse function
database/zerocore.rs 15,098 lines -- Every database operation
codegen/emitter.rs 8,837 lines -- Every emit function
vm/renderer.rs 7,540 lines -- Every render path
Tier 2: Function-level review (8 files, 49,563 lines)
parser/ast.rs 5,162 lines -- Type definitions
typechecker/checker.rs 7,715 lines -- Type checking logic
server/http.rs 3,870 lines -- HTTP routing
server/websocket.rs 3,200 lines -- WebSocket handling
lexer/scanner.rs 4,248 lines -- Token scanning
lexer/token.rs 1,606 lines -- Token definitions
database/storage.rs 5,800 lines -- Storage backends
vm/builtins/* 15,967 lines -- Built-in functions
Tier 3: Spot checking (92 files, 69,800 lines)
Remaining modules, tests, configurationTier 1 files were read line by line because they had the highest defect density (based on prior debugging sessions) and the highest impact (any bug in the VM affects all FLIN programs). Tier 2 files were reviewed at the function level -- reading each function's signature, understanding its purpose, and verifying its error handling without tracing every line of implementation. Tier 3 files were spot-checked for common patterns: TODO comments, panic calls, dead code, and security anti-patterns.
What Session Archaeology Revealed
The most interesting audit findings were not bugs -- they were patterns in how the codebase evolved across sessions.
Feature accretion. The token vocabulary grew from approximately 30 tokens in early sessions to 80+ by Session 252. Each session that added a language feature also added tokens, keywords, and AST nodes. The accumulation was orderly -- new items were added at logical positions in the enum definitions, with session comments marking the additions -- but the sheer volume meant that the parser's match blocks grew to thousands of lines.
Defensive layering. Later sessions tended to add defensive checks around earlier sessions' code rather than refactoring it. When Session 200's code encountered a bug in Session 100's implementation, the fix was usually a wrapper or a guard condition rather than a rewrite of the original code. This is natural -- rewriting code from a previous session risks breaking features that depend on its current behavior. But it produces a codebase with redundant validation layers.
rust// Example: defensive layering visible in the audit
// Original (Session ~50): direct field access
fn get_entity_field(entity: &Entity, field: &str) -> Value {
entity.fields[field].clone()
}
// After Session ~120: added existence check
fn get_entity_field(entity: &Entity, field: &str) -> Value {
if let Some(value) = entity.fields.get(field) {
value.clone()
} else {
Value::None
}
}
// After Session ~180: added type coercion layer
fn get_entity_field(entity: &Entity, field: &str) -> Value {
if let Some(value) = entity.fields.get(field) {
match value {
Value::Object(id) if is_string_object(id) => {
// Coerce to Value::Text for consistency
Value::Text(get_string(id).to_string())
}
other => other.clone()
}
} else {
Value::None
}
}Specification drift. Some features evolved away from their original specification as implementation reality set in. The audit found several cases where the code's behavior differed from the session logs' description of intent. These were not bugs in the traditional sense -- the code worked correctly for its actual use cases -- but they represented documentation debt that needed to be reconciled.
The Audit's Session Log
The audit itself was structured as a series of audit sessions, distinct from FLIN's development sessions. Each audit session focused on a specific module or file set:
Audit Session 0: Setup, dashboard, pre-audit findings
Audit Sessions 1-3: Lexer module (5,880 lines)
Finding: CLEAN. Zero TODOs, zero panics, zero dead code.
Audit Sessions 4-5: Parser AST (5,192 lines)
Finding: CLEAN. Comprehensive type definitions.
Audit Sessions 6-15: Parser logic (16,544 lines)
Finding: CLEAN in production code. 600 test panics (expected).
Audit Sessions 16-30: VM core (27,257 lines)
Finding: 48 panics, duplicate CreateMap, 30% of all TODOs.
Audit Sessions 31-40: Renderer (7,540 lines)
Finding: 9 panics, click handler bugs.
Audit Sessions 41-60: Database (28,395 lines)
Finding: Persistence bugs, missing destroy, WAL gaps.
Audit Sessions 61-80: Server, storage, AI (27,982 lines)
Finding: WebSocket gaps, missing S3 backend.
Audit Sessions 81-93: Remaining modules, summary
Finding: Minor issues only.The pattern was clear: the cleanest modules were the ones closest to computer science fundamentals (lexer, parser), and the most problematic modules were the ones that integrated multiple systems (VM, renderer, database). This makes sense -- integration points are where assumptions from different subsystems collide.
The Exhaustive Audit Prompt
After the initial audit and all fixes were complete, we prepared for Phase 5: the exhaustive re-audit. This required a verification matrix covering every FLIN language feature:
Core Language: Variables, reactive updates, text interpolation,
operators, ternary, logical operators
Events: click, submit, change, input
Binding: Two-way binding with bind={}
Conditionals: {if}, {else}, nested
Loops: {for item in list}, {for i in range}, nested
Entities: CRUD, queries, time-travel, transactions
Routes: GET/POST/PUT/DELETE, params, query, body, headers
Guards: auth, csrf, role, rate_limit
Validation: @required, @email, @min, @max, @one_of, etc.
Translations: t(), language switching, layout translations
Sessions: Read/write, persistence, theme, language
Components: Props, children, click handlers
Layouts: Named layouts, children placeholder
File Upload: Type validation, size limits, save_file()
AI Features: search, ask, ask_ai, ask_claude
WebSocket: Connect, message, disconnect, broadcastEach feature on this list was tested against the fixed codebase. The re-audit was not looking for new bugs -- it was confirming that the 21 fixes did not introduce regressions and that every documented feature still worked as specified.
The Value of Completeness
Reading every line of a 186,252-line codebase is an investment. It took the concentrated effort of multiple audit sessions over several days. But the return on that investment was disproportionate to the effort.
Before the audit, we knew FLIN worked for the cases we had tested. After the audit, we knew every case where it did not work. Before the audit, bugs were discovered through user reports and debugging sessions -- reactive, expensive, unpredictable. After the audit, every known defect was catalogued with its exact location and a proposed fix -- proactive, systematic, schedulable.
The 93 sessions audited in one pass represented the accumulated work of 42 days. The audit distilled that work into a single, coherent picture: what FLIN is, where it is strong, where it is weak, and exactly what needs to happen to make it ready for the world.
That picture -- the complete map of a programming language's internals, drawn from exhaustive reading rather than sampling -- is what made the subsequent fix sessions so fast. When you know exactly where every problem lives, fixing them is just a matter of walking the list. The audit was the map. The fixes were the walk. And at the end of the walk, FLIN was ready for beta.
This is Part 155 of the "How We Built FLIN" series, documenting how a CEO in Abidjan and an AI CTO designed and built a programming language from scratch.
Series Navigation: - [154] Production Panic Calls: Tracking and Elimination - [155] 93 Sessions Audited in One Pass (you are here) - Next arc: FLIN's Path to Beta