#013 -- Pratt Parsing: How FLIN Reads Your Code

Pratt parsing is one of the most elegant algorithms in compiler design. It handles operator precedence without a grammar table, it is extensible without rewriting the main loop, and it fits in fewer than 200 lines of Rust. When we needed to build FLIN's parser -- a parser that handles arithmetic, comparisons, logical operators, member access, function calls, temporal queries, AI intent expressions, and HTML-like view constructs -- Pratt parsing was the obvious choice.

This article explains how Pratt parsing works, why we chose it over the alternatives, and how we extended it to handle FLIN's unusual features like entity construction, the @ temporal operator, and the ask/search intent expressions.

The Problem: Operator Precedence

Consider the expression 2 + 3 * 4. Every programmer knows this equals 14, not 20, because multiplication has higher precedence than addition. A parser must encode this knowledge.

The naive approach -- recursive descent with one function per precedence level -- works but scales poorly. You end up with a chain of functions: parse_assignment calls parse_or, which calls parse_and, which calls parse_equality, which calls parse_comparison, which calls parse_term, which calls parse_factor, which calls parse_unary, which calls parse_primary. Each function exists only to enforce a single precedence level. Adding a new operator means inserting a new function into the chain and updating the call graph.

FLIN has 10 precedence levels. A naive recursive descent parser would need 10 mutually recursive functions just for expressions. That is not terrible, but it is unnecessary. Pratt parsing collapses the entire precedence hierarchy into a single loop.

The Pratt Algorithm

Vaughan Pratt's insight, published in 1973, is that you can parse expressions using a single function that takes a minimum precedence level as a parameter. The algorithm:

Parse a prefix expression (a literal, an identifier, a unary operator, or a grouped expression).
While the next token is an infix operator with precedence greater than or equal to the minimum, parse the infix expression by recursively calling the same function with the operator's precedence as the new minimum.
Return the resulting expression tree.

The precedence levels are just numbers. Higher means tighter binding. Assignment is 1 (loosest). Postfix operators are 9 (tightest). The parser consults these numbers instead of a chain of functions.

In FLIN, we defined 10 precedence levels:

rust#[derive(Debug, Clone, Copy, PartialEq, PartialOrd)]
#[repr(u8)]
pub enum Precedence {
    None       = 0,
    Assignment = 1,   // =         (right associative)
    Or         = 2,   // ||
    And        = 3,   // &&
    Equality   = 4,   // == !=
    Comparison = 5,   // < > <= >=
    Term       = 6,   // + -
    Factor     = 7,   // * / %
    Unary      = 8,   // ! - ++ -- (prefix)
    Postfix    = 9,   // ++ -- . [] () @
}

The #[repr(u8)] attribute ensures these are stored as single bytes, and the PartialOrd derive lets us compare precedence levels with < and >=. The ordering is implicit in the enum variant order -- Rust derives PartialOrd based on discriminant values.

The Core Loop

The heart of the Pratt parser is a method we call parse_expression, which delegates to parse_precedence:

rustfn parse_expression(&mut self) -> Result<Expr, ParseError> {
    self.parse_precedence(Precedence::Assignment)
}

fn parse_precedence(&mut self, min_prec: Precedence) -> Result<Expr, ParseError> {
    // Step 1: Parse prefix
    let mut left = self.parse_prefix()?;

    // Step 2: Parse infix operators while precedence allows
    while let Some(prec) = self.infix_precedence() {
        if prec < min_prec {
            break;
        }
        left = self.parse_infix(left, prec)?;
    }

    Ok(left)
}

That is the entire Pratt loop. Fourteen lines. The magic is in parse_prefix, parse_infix, and infix_precedence -- the three methods that tell the loop what to do with each token.

Prefix Parsing

The prefix parser handles tokens that appear at the start of an expression: literals, identifiers, unary operators, grouped expressions, and FLIN-specific constructs like ask and search.

Literals are the simplest case. When the parser sees an Integer, Float, String, or Bool token, it wraps the value in an Expr::Literal node and returns.

Unary operators (!, -, ++, --) consume the operator token, recursively parse the operand at Unary precedence, and return an Expr::Unary node. The recursive call with Unary precedence ensures that -x <em> 3 parses as (-x) </em> 3 rather than -(x * 3), because unary precedence (8) is higher than factor precedence (7).

Grouped expressions ((expr)) consume the opening parenthesis, parse the inner expression at the lowest precedence (to allow any operator inside), consume the closing parenthesis, and return the inner expression without wrapping it in an additional node. Parentheses are pure syntax -- they affect parsing order but leave no trace in the AST.

The ask expression is FLIN-specific: ask "What are the overdue invoices?" compiles to an AI intent query. The prefix parser recognizes the ask keyword, parses the following string expression, and returns an Expr::Ask node. Similarly, search "query" in Entity by field limit 5 is parsed as a structured expression with multiple clauses.

Infix Parsing

The infix parser handles tokens that appear between two expressions: binary operators, member access, function calls, index access, and the temporal @ operator.

For standard binary operators (+, -, *, /, ==, !=, <, >, &&, ||), the infix parser consumes the operator, recursively calls parse_precedence with the operator's precedence (for left-associative operators) or precedence minus one (for right-associative operators like assignment), and returns an Expr::Binary node.

The distinction between left and right associativity is a single line of code:

rustfn parse_infix(&mut self, left: Expr, prec: Precedence) -> Result<Expr, ParseError> {
    let op = self.advance(); // consume the operator

    match op.kind {
        // Right-associative: assignment
        TokenKind::Equal => {
            let right = self.parse_precedence(Precedence::Assignment)?;
            Ok(Expr::Assign { target: Box::new(left), value: Box::new(right) })
        }

        // Left-associative: everything else
        TokenKind::Plus | TokenKind::Minus |
        TokenKind::Star | TokenKind::Slash | TokenKind::Percent => {
            let right = self.parse_precedence(next_precedence(prec))?;
            Ok(Expr::Binary {
                left: Box::new(left),
                op: token_to_binop(&op),
                right: Box::new(right),
            })
        }

        // Member access: expr.field
        TokenKind::Dot => {
            let field = self.expect_identifier()?;
            // Check for method call: expr.method(args)
            if self.check(TokenKind::LeftParen) {
                let args = self.parse_call_args()?;
                Ok(Expr::MethodCall {
                    object: Box::new(left),
                    method: field,
                    args,
                })
            } else {
                Ok(Expr::Member {
                    object: Box::new(left),
                    field,
                })
            }
        }

        // Temporal access: expr @ time
        TokenKind::At => {
            let time_expr = self.parse_precedence(Precedence::Postfix)?;
            Ok(Expr::Temporal {
                expr: Box::new(left),
                time: Box::new(time_expr),
            })
        }

        // ... other infix operators
    }
}

Member access (.) and index access ([]) are infix operators at Postfix precedence, which is the highest. This ensures that a.b + c parses as (a.b) + c, not a.(b + c). The temporal @ operator also sits at Postfix precedence, so user.name @ yesterday parses as (user.name) @ yesterday -- access the name field, then query its value at yesterday's timestamp.

The Disambiguation Problem

FLIN has a syntactic ambiguity that required careful handling. Consider:

flinUser { name: "Juste" }

Is this an entity construction (create a User with name set to "Juste") or an identifier followed by a code block? The parser cannot tell from the current token alone -- it needs to look ahead.

We solved this with a looks_like_entity_construct method that peeks at the tokens after the opening brace:

rustfn looks_like_entity_construct(&self) -> bool {
    // Pattern: Identifier { Identifier : ...
    // If we see identifier-colon after the brace, it is entity construction.
    // If we see a statement keyword or no colon, it is a block.
    if let Some(TokenKind::Identifier(_)) = self.peek_kind() {
        if let Some(TokenKind::Colon) = self.peek_nth_kind(1) {
            return true;
        }
    }
    false
}

The heuristic is simple: if the first thing inside the braces is identifier: ..., it is an entity construction. Otherwise, it is a code block. This works because code blocks start with statements (keywords, assignments, expressions) and entity constructions start with field-value pairs. The two patterns are syntactically distinct after two tokens of lookahead.

This is a bounded lookahead -- we never look more than two tokens ahead. The parser remains O(n) in the number of tokens.

Short-Circuit Evaluation

Logical operators && and || require special treatment because they are short-circuit: the right operand should not be evaluated if the left operand determines the result. At the parsing level, this does not change the algorithm -- both are left-associative binary operators at their respective precedence levels. The short-circuit behavior is encoded later, in the code generator, using conditional jumps.

But the parser does something important: it tags && and || as BinOp::And and BinOp::Or in the AST, distinct from BinOp::BitwiseAnd or BinOp::BitwiseOr (which FLIN does not currently support, but the distinction is architecturally sound). This tagging ensures the code generator knows which operators require short-circuit emission without having to re-analyze the operator token.

Statement Parsing

While the Pratt parser handles expressions, FLIN's parser also needs to handle statements: variable declarations, entity declarations, control flow, save/delete operations, and route definitions. These are parsed by a separate dispatch method:

rustfn parse_statement(&mut self) -> Result<Stmt, ParseError> {
    match self.peek_kind() {
        Some(TokenKind::Keyword(Keyword::Entity)) => self.parse_entity_decl(),
        Some(TokenKind::Keyword(Keyword::Save))   => self.parse_save_stmt(),
        Some(TokenKind::Keyword(Keyword::Delete)) => self.parse_delete_stmt(),
        Some(TokenKind::Keyword(Keyword::If))     => self.parse_if_stmt(),
        Some(TokenKind::Keyword(Keyword::For))    => self.parse_for_stmt(),
        Some(TokenKind::Keyword(Keyword::Route))  => self.parse_route_stmt(),
        Some(TokenKind::Identifier(_))            => self.parse_var_or_expr_stmt(),
        _                                          => self.parse_expr_stmt(),
    }
}

Each statement type has its own parsing method. parse_entity_decl consumes the entity keyword, an identifier (the entity name), an opening brace, a sequence of field declarations, and a closing brace. parse_if_stmt consumes the if keyword, a condition expression, a brace-delimited body, and an optional else clause.

The parse_var_or_expr_stmt method handles another ambiguity: when the parser sees an identifier, it could be the start of a variable declaration (name = "Juste") or an expression statement (print("hello")). The parser peeks ahead for = or : to distinguish the two cases.

Control Flow: If and For

FLIN's if statement supports if, else if, and else chains:

flinif count > 10 {
    status = "high"
} else if count > 5 {
    status = "medium"
} else {
    status = "low"
}

The parser handles this recursively: after parsing the if body, it checks for an else keyword. If found, it checks whether the next token is another if (making it an else if chain) or an opening brace (making it a plain else). This recursive structure means if/else if/else chains are represented in the AST as nested If nodes, not as a flat list. The code generator handles this naturally with nested conditional jumps.

FLIN's for loop supports iteration over collections with an optional index variable:

flinfor item in items {
    save item
}

for item, index in items {
    // index is available
}

The parser consumes the for keyword, one or two identifiers (separated by a comma), the in keyword, a collection expression, and a brace-delimited body. The optional index variable is a small syntactic convenience that eliminates the need for a separate counter variable.

The Numbers

By the end of sessions 5 and 6, the parser was feature-complete for basic FLIN programs. The statistics tell the story:

Metric	Session 5	Session 6	Total
Parser lines of code	1,160	+600	~2,600
Tests	18 new	24 new	42 parser tests
Total tests (all phases)	125	149	149
Statement types	7	10	10
Expression types	Primary only	Full Pratt	All operators
Precedence levels	0	10	10

Session 5 built the parser core -- the Parser struct, the advance/check/expect methods, variable and entity declaration parsing. Session 6 added the Pratt parser for expressions and control flow statements. Together, they took about 75 minutes.

Error Recovery

A parser that stops at the first error is useless in practice. Real source files have multiple errors, and developers want to see all of them, not just the first one. FLIN's parser implements error recovery through a synchronize method:

rustfn synchronize(&mut self) {
    while !self.is_at_end() {
        // Stop at statement boundaries
        if self.previous_kind() == Some(TokenKind::Newline) {
            return;
        }
        match self.peek_kind() {
            Some(TokenKind::Keyword(Keyword::Entity)) |
            Some(TokenKind::Keyword(Keyword::Save))   |
            Some(TokenKind::Keyword(Keyword::Delete)) |
            Some(TokenKind::Keyword(Keyword::If))     |
            Some(TokenKind::Keyword(Keyword::For))    |
            Some(TokenKind::Keyword(Keyword::Route))  => return,
            _ => { self.advance(); }
        }
    }
}

When the parser encounters an error, it calls synchronize(), which advances through tokens until it reaches a likely statement boundary -- a newline or a keyword that starts a new statement. Then the parser resumes normal parsing. The error is recorded but does not abort compilation.

This approach is imperfect -- sometimes synchronization skips too many or too few tokens, producing cascading errors. But it is vastly better than stopping at the first error, and it is standard practice in production compilers.

Why Not a Parser Generator?

We could have used a parser generator like LALR, PEG, or tree-sitter to define FLIN's grammar and auto-generate the parser. We chose not to, for three reasons.

First, FLIN's modal syntax (code interleaved with HTML-like views) does not fit cleanly into any standard grammar formalism. A parser generator would have required awkward workarounds for the mode transitions that the hand-written lexer handles naturally.

Second, error messages from generated parsers are typically poor. "Expected token X, found token Y" is the best you get. A hand-written parser can produce messages like "entity declaration missing opening brace -- did you forget { after the entity name?" This level of specificity is important for a language targeting developers who may be new to programming.

Third, a hand-written Pratt parser is simple enough that it does not benefit from automation. The entire expression parser is under 200 lines. The statement parser adds another 400. A parser generator would save us maybe 200 lines of code while taking away our ability to customize error handling, lookahead behavior, and mode transitions.

What Came Next

The parser produces an AST -- an abstract syntax tree that represents the program's structure without any of the syntactic noise (parentheses, braces, semicolons) of the source code. The next article examines that AST in detail: what nodes it contains, how it represents FLIN's unique features (entities, views, temporal access, intent queries), and the design decisions that make it suitable for both type checking and code generation.

This is Part 13 of the "How We Built FLIN" series, documenting how a CEO in Abidjan and an AI CTO built a programming language from scratch.

Series Navigation: - [11] Session 1: Project Setup and 42 Keywords - [12] Building a Lexer From Scratch in Rust - [13] Pratt Parsing: How FLIN Reads Your Code (you are here) - [14] The Abstract Syntax Tree: FLIN's Internal Representation - [15] Hindley-Milner Type Inference in a Custom Language