Back to flin
flin

10 Sessions: From Zero to a Working Compiler

Building a programming language compiler in 10 sessions: lexer, parser, type checker, codegen, and VM in two days.

Thales & Claude | March 25, 2026 14 min flin
flinsprintsessionscompilermilestonepace

On January 1, FLIN was a name. By January 2, it had a lexer, parser, type checker, code generator, and virtual machine. Ten sessions. Two days.

Building a programming language compiler is supposed to take months. University courses spend an entire semester on it. The "Crafting Interpreters" textbook takes 800 pages to get from nothing to a working bytecode VM. The first C compiler took years. The first Go compiler took a team of three experienced engineers multiple months.

We did it in ten sessions across two calendar days, each session between 25 and 40 minutes, with a combined total of roughly five hours of implementation time. Not a toy. Not a calculator. A compiler for a language with 12 data types, 75+ bytecode opcodes, entity declarations, temporal operators, reactive views, AI-powered queries, and a virtual machine that can execute all of it.

This article is the story of those ten sessions -- what was built in each, the decisions that made the pace possible, and the moments where things could have gone wrong but did not.

The Setup: What Existed Before Session 001

Before the first session, FLIN had a specification. Not a vague idea or a wish list -- a specification. Six PRD documents:

  • PRD 01: Language Overview (design philosophy, target audience, use cases)
  • PRD 02: Syntax Specification (every construct, formal grammar in EBNF)
  • PRD 03: Type System (12 types, coercion rules, inference behavior)
  • PRD 04: Temporal Model (versioning semantics, @ operator, FlinDB integration)
  • PRD 06: Compiler Architecture (compilation pipeline, phase boundaries, code examples)
  • PRD 08: Bytecode Specification (opcode tables, file format, constant pool encoding)

These documents were written collaboratively -- Thales described what the language should do, Claude refined the technical details and wrote the specifications. By the time Session 001 started, every decision had been made. The token types were defined. The AST nodes were specified. The opcode values were assigned. The file format was designed.

This is the single most important factor in the compiler's rapid development. Not the AI's coding speed. Not the language's simplicity. The specifications. When you sit down to implement a lexer and you already know every token type, every keyword, and every edge case (how to distinguish < the comparison operator from < the tag opener), the implementation is purely mechanical. There are no design meetings. There are no "let me think about how this should work" pauses. You read the spec, you write the code.

Session 001: Project Setup and Token Definitions

Date: January 1, 2026. Duration: ~30 minutes.

The first session created the Rust project, defined all token types, and implemented the basic Span and Position types for source location tracking.

pub struct Token {
    pub kind: TokenKind,
    pub span: Span,
    pub lexeme: String,
}

pub struct Span { pub start: Position, pub end: Position, }

pub struct Position { pub line: u32, pub column: u32, pub offset: u32, } ```

The TokenKind enum had over 60 variants, covering literals (integer, float, string, boolean), 30+ keywords (entity, save, delete, where, find, all, if, for, ask, search, now, yesterday...), operators (arithmetic, comparison, logical, temporal @, increment/decrement), delimiters, and the special HTML/view tokens (TagOpen, TagClose, TagSelfClose, TagEnd).

The session also set up Cargo.toml, the module structure, and the test framework. By the end, the project compiled and the token definitions were complete. No scanner yet -- just the vocabulary of the language.

Sessions 002-003: Scanner Implementation

Date: January 2, 2026. Duration: ~55 minutes combined.

Session 002 built the core scanner: reading characters, matching single and multi-character tokens, scanning strings, scanning numbers, and recognizing keywords versus identifiers.

Session 003 added the critical innovation -- the tri-modal lexer:

enum LexerMode {
    Code,            // Normal code: count = 0
    View,            // Inside HTML tags: <button click=...>
    ViewExpression,  // Inside {expr} within a view
}

FLIN source code switches between code mode and view mode within a single file. The expression contains HTML-like syntax, embedded expressions, and event handlers. The lexer must produce different tokens depending on context -- < is Less in code mode but TagOpen in view mode.

The mode transitions are deterministic:

  • Seeing < followed by an alphabetic character switches from Code to View
  • Seeing { in View mode switches to ViewExpression
  • Seeing } in ViewExpression switches back to View
  • Seeing > or /> in View mode returns to the appropriate context

By the end of Session 003, the lexer could tokenize any valid FLIN program. 87 tests passed.

Session 004: AST Definition

Date: January 2. Duration: ~25 minutes.

Session 004 defined the complete Abstract Syntax Tree: the Program struct, the Stmt enum (11 variants: EntityDecl, VarDecl, Assignment, Save, Delete, If, For, Route, View, Style, Expr), the Expr enum (15 variants including EntityCreate, EntityQuery, Ask, Search, Temporal), and the view AST types (ViewElement, ViewAttribute, ViewChild, ViewIf, ViewFor).

This session wrote no logic -- it was pure type definition. But these types are the spine of the compiler. Every phase after this one either produces or consumes AST nodes. Getting the types right meant that subsequent sessions could lean on the Rust compiler to enforce correctness: if the parser produces a Stmt::If with a condition, then_branch, and else_branch, the type checker must handle all three fields, and the code generator must handle all three fields. Forget one, and the code does not compile.

Sessions 005-006: Parser

Date: January 2. Duration: ~60 minutes combined.

Session 005 built the recursive-descent parser for statements and basic expressions. Session 006 replaced the expression parser with a Pratt parser (precedence climbing) and added control flow parsing.

The Pratt parser was the key architectural choice. Recursive descent works well for statements, where each statement type has a distinct leading token (entity, save, if, for, <). But for expressions, where operator precedence and associativity create complex nesting, a Pratt parser is both simpler and more correct:

fn parse_expression(&mut self, precedence: u8) -> Result<Expr, ParseError> {
    // Parse prefix (literal, identifier, unary operator, parenthesized expr)
    let mut left = self.parse_prefix()?;

// Parse infix operators while precedence allows while !self.is_at_end() && precedence < self.current_precedence() { left = self.parse_infix(left)?; }

Ok(left) } ```

This 10-line function handles the entire expression grammar: arithmetic with correct precedence (a + b c parses as a + (b c)), comparison chains, logical operators with short-circuit semantics, postfix operators, field access, index access, function calls, and the temporal @ operator.

The parser also handled FLIN's view syntax, which requires special care because HTML-like elements are statements that contain expressions:

<div class="counter">
    <h1>{title}</h1>
    <button click={count++}>Increment</button>
    <p>{count}</p>
    {if count > 10}
        <span>High count</span>
    {/if}
</div>

By the end of Session 006, 158 tests passed. The parser could handle every construct in the FLIN specification.

Sessions 007-008: Type Checker

Date: January 2. Duration: ~50 minutes combined.

Session 007 built the type checker foundation: the Type enum (12 types including SemanticText, Money, Entity, and Optional), scope management, and basic type inference.

Session 008 extended it with Hindley-Milner-style inference for expressions. The key insight was that FLIN's type system is intentionally simple -- no generics, no type parameters, no higher-kinded types. The type checker needs to:

1. Track declared entity schemas (entity User { name: text, email: text }) 2. Infer types from literals (42 is Int, "hello" is Text) 3. Check binary operations (Int + Int is valid, Int + Text is not) 4. Infer query return types (User.all returns [User], User.count returns Int) 5. Verify entity field access (user.name is valid if User has a name field) 6. Validate temporal expressions (the @ operator preserves the base type)

pub fn infer_type(&mut self, expr: &Expr) -> Result<Type, TypeError> {
    match expr {
        Expr::Integer(_) => Ok(Type::Int),
        Expr::String(_) => Ok(Type::Text),
        Expr::Bool(_) => Ok(Type::Bool),
        Expr::EntityQuery { entity, operation } => {
            let entity_type = Type::Entity(entity.clone());
            match operation {
                QueryOp::All => Ok(Type::List(Box::new(entity_type))),
                QueryOp::Count => Ok(Type::Int),
                QueryOp::First | QueryOp::Find(_) =>
                    Ok(Type::Optional(Box::new(entity_type))),
                QueryOp::Where(_) | QueryOp::Order(_) =>
                    Ok(Type::List(Box::new(entity_type))),
            }
        }
        Expr::Temporal { expr, .. } => self.infer_type(expr),
        // ...
    }
}

By the end of Session 008, 193 tests passed. The type checker validated every expression and statement type in the language.

Session 009: Code Generator

Date: January 2. Duration: ~30 minutes.

This is the session described in the previous article. The code generator walks the typed AST and emits bytecode: 75+ opcodes, constant pool management with deduplication, jump patching for control flow, view instruction emission, entity method detection, optimized literal handling, and short-circuit boolean evaluation.

1,700 lines of Rust. 26 new tests. 219 total. The counter example compiled to valid bytecode.

Session 010: Virtual Machine

Date: January 2. Duration: ~35 minutes.

The final session of the sprint. Session 010 built the VM foundation: value representation, the operand stack, the call stack, global variable storage, heap allocation, and instruction dispatch for all 75+ opcodes.

pub struct VM {
    stack: Vec<Value>,
    frames: Vec<CallFrame>,
    ip: usize,
    globals: HashMap<String, Value>,
    heap: Vec<HeapObject>,
    free_list: Vec<usize>,
    bytes_allocated: usize,
    gc_threshold: usize,
    output: Vec<String>,
    debug: bool,
}

The Value enum uses a compact representation. Primitive values (None, Bool, Int, Float) are stored inline. Complex values (String, List, Map, Entity, Function, Closure) are stored on the heap, with the Value holding an ObjectId index into the heap array.

pub enum Value {
    None,
    Bool(bool),
    Int(i64),
    Float(f64),
    Object(ObjectId),
}

The instruction dispatch loop is a match on the current opcode:

Fetch opcode at IP -> Decode operands -> Execute -> Advance IP -> Repeat

For arithmetic: pop two values, perform the operation, push the result. For jumps: read the target address from the operand, set IP. For LoadGlobal: read the constant pool index, look up the identifier string, find the global by name, push the value. For CreateElement: look up the tag name in the constant pool, create a new element on the VM's element stack.

Entity operations (Save, Delete, QueryAll) and temporal operations (AtVersion, AtTime) were implemented as stubs that record the intent but do not yet connect to FlinDB. View operations were similarly stubbed -- they emit events to the VM's output log but do not yet render to a real DOM. This was deliberate. The goal of Session 010 was to prove that the instruction dispatch loop works, not to build the entire runtime.

The critical test: the counter example. Compile count = 0; count++ through the full pipeline (lexer, parser, type checker, code generator), hand the bytecode to the VM, execute it, and verify that the global variable count holds the value 1 after execution.

It passed. 251 tests total. 32 new in this session alone.

The Numbers

SessionDurationFocusTests AddedTotal Tests
001~30 minProject setup, token definitions1212
002~30 minCore scanner3850
003~25 minView mode, tri-modal lexer3787
004~25 minAST definitions895
005~30 minRecursive descent parser30125
006~30 minPratt parser, control flow33158
007~25 minType checker foundation19177
008~25 minHindley-Milner inference16193
009~30 minCode generator26219
010~35 minVirtual machine32251

Five hours and twenty-five minutes of implementation time. 251 tests. A complete compilation pipeline from source text to executing bytecode.

Why This Worked

The specifications eliminated design decisions during implementation. Every session began with a clear target: "implement this phase as specified in PRD 06." There was no ambiguity about what token types existed, what the AST looked like, which opcodes to emit, or how the VM should dispatch them. The specs were the blueprint; the sessions were construction.

Each phase was a clean handoff. The lexer produces tokens. The parser consumes tokens and produces an AST. The type checker consumes an AST and produces a typed AST. The code generator consumes a typed AST and produces bytecode. The VM consumes bytecode and produces execution. No phase reaches backward. No phase knows about phases two steps away. This separation meant that each session could be implemented and tested in isolation.

The test suite was the progress tracker. Every session added tests before or alongside the implementation. When a test failed, the bug was in the code written in the current session -- not in code from three sessions ago. This is trivially true in hindsight, but it is the opposite of how many projects work, where tests are written after the fact and bugs are discovered far from their origin.

Rust's type system prevented integration bugs. When Session 009 (code generator) consumed the AST produced by Session 006 (parser), the Rust compiler guaranteed that every AST variant was handled. Forget to emit bytecode for Expr::Temporal? Compile error. Pass a Stmt where an Expr is expected? Compile error. These are the bugs that plague dynamically typed implementations of compilers, where a missing case in a switch statement silently falls through and produces incorrect output. In Rust, they cannot happen.

The CEO-AI CTO workflow eliminated overhead. Thales did not review pull requests. Claude did not wait for approval. Each session was a continuous implementation flow: spec to code to tests to commit. The traditional overhead of a software project -- standups, code reviews, environment setup, context switching -- was zero. Not reduced. Zero.

What Was Left for Later

Ten sessions built the compilation pipeline. They did not build the complete runtime. What remained after Session 010:

  • Memory management. The heap had allocation but no garbage collection.
  • FlinDB integration. Entity operations were stubs.
  • View rendering. View instructions logged output but did not produce DOM.
  • HTTP server. Route handlers existed in the AST but not in the runtime.
  • Hot reload. The compiler could compile once; it could not watch for changes and recompile.
  • Error diagnostics. The compiler reported errors, but without source context, colors, or suggestions.

These would be built in Sessions 011 through 018, each session extending the foundation that the first ten sessions had established. But the critical milestone was Session 010. At the end of Session 010, FLIN was no longer a specification. It was a working compiler.

The Lesson

The lesson is not "AI makes compilers easy." The lesson is that specification-driven development makes complex engineering achievable in compressed timeframes, and the CEO-AI CTO model eliminates the coordination overhead that normally stretches that timeframe by an order of magnitude.

The specifications took time to write. The PRDs were not weekend documents -- they were precise technical artifacts that anticipated edge cases, defined exact behavior, and provided implementation guidance in the form of Rust code examples. That investment paid for itself many times over. Five hours of implementation produced a working compiler because fifty hours of specification work had already resolved every design question.

Most software projects operate in the opposite direction. They start coding, discover design questions during implementation, stop to discuss them, resume coding, discover more questions, and iterate until the deadline arrives. The total implementation time may be the same, but it is spread over months and punctuated by costly context switches.

We compressed those months into two days by doing the thinking first and the typing second.

---

This is Part 18 of the "How We Built FLIN" series, documenting how a CEO in Abidjan and an AI CTO built a programming language compiler in sessions measured in minutes, not months.

Next in the series: The error diagnostic system -- how FLIN produces error messages that are written for humans, not compiler engineers.

Share this article:

Responses

Write a response
0/2000
Loading responses...

Related Articles