Back to flin
flin

Building a Stack-Based Virtual Machine in Rust

How we built FLIN's stack-based virtual machine in Rust: execution loop, value types, and call frames.

Thales & Claude | March 25, 2026 13 min flin
flinvmstack-machinerustexecutionbytecode

The VM is where FLIN code comes alive. A stack, a program counter, and over a hundred opcodes -- that is the entire machine.

On January 2, 2026, in Session 010, we sat down to build the virtual machine that would execute every FLIN program ever written. Not an interpreter that walks an AST. Not a transpiler that generates JavaScript. A real bytecode virtual machine, written in Rust, with a dedicated operand stack, a call stack, a heap, and a garbage collector. The kind of thing that languages like Lua, Python, and Ruby are built on -- except ours needed to handle views, entities, and reactivity as first-class concerns.

This is the story of how we designed and implemented that VM in a single session, producing 2,850 lines of Rust code and 32 new tests. It is the foundation that everything else in FLIN stands on.

---

Why a Stack Machine

There are two dominant architectures for virtual machines: register-based and stack-based. The JVM and CPython use a stack. The Lua VM (since 5.0) and Dalvik use registers. Each has trade-offs.

We chose a stack machine for three reasons.

First, simplicity of code generation. FLIN's compiler emits bytecode from an AST. With a stack machine, every expression becomes a postfix sequence: push operands, apply operator. There is no register allocation pass, no graph colouring, no spill logic. The code generator walks the AST recursively and emits instructions in a single pass. That simplicity was critical when we were building the entire toolchain -- lexer, parser, type checker, code generator, and VM -- in a matter of days.

Second, compact bytecode. Stack instructions are typically one byte for the opcode plus zero to two bytes for operands. Register instructions need to encode source and destination registers, making each instruction wider. For FLIN, where bytecode would eventually travel over the network for hot module reload, smaller is better.

Third, the precedent of CPython and the JVM. Both are stack machines. Both have decades of proof that stack machines can be fast enough for real-world workloads. We were not trying to compete with V8's JIT compiler. We were trying to build a correct, maintainable VM in the fewest lines of code possible.

---

The Value Representation

Before the VM can execute anything, it needs to know what values look like at runtime. This is the Value enum -- the atom of the entire system:

pub enum Value {
    None,
    Bool(bool),
    Int(i64),
    Float(f64),
    Object(ObjectId),
}

Five variants. That is the entire runtime type system at the lowest level. None is the absence of a value. Bool, Int, and Float are immediate values -- they live directly on the stack with no heap allocation. Object(ObjectId) is a handle to something on the heap: a string, a list, a map, an entity instance, a function, a closure, or an iterator.

The decision to keep Int and Float as separate variants (rather than a unified Number type) was deliberate. FLIN has distinct Int and Float types in its type system, and we wanted the VM to preserve that distinction. Integer arithmetic is exact. Float arithmetic is IEEE 754. Conflating them would mean every integer operation pays for floating-point semantics.

Every heap-allocated object carries a header:

pub struct HeapObject {
    header: ObjectHeader,
    data: ObjectData,
}

pub struct ObjectHeader { type_tag: TypeTag, marked: bool, size: u32, }

pub enum ObjectData { String(String), List(Vec), Map(HashMap), Entity(EntityInstance), Function(FunctionData), Closure(ClosureData), NativeFunction(NativeFunctionData), Iterator(IteratorData), } ```

The TypeTag discriminates object types. The marked bit is for garbage collection. The size field tracks how many bytes this object occupies on the heap, which the GC uses to decide when to trigger a collection.

This design means that stack operations on primitives (Int, Float, Bool, None) are zero-allocation. Only when the program creates a string, list, map, or entity does the VM touch the heap. For a counter program -- count = 0; count++ -- the heap is never touched at all.

---

The VM Struct

With values defined, the VM itself is surprisingly small:

pub struct VM {
    stack: Vec<Value>,              // Operand stack
    frames: Vec<CallFrame>,         // Call stack
    ip: usize,                      // Instruction pointer
    globals: HashMap<String, Value>,
    heap: Vec<HeapObject>,          // Object storage
    free_list: Vec<usize>,          // GC recycling
    bytes_allocated: usize,
    gc_threshold: usize,
    output: Vec<String>,            // Print output
    debug: bool,
}

The operand stack holds values that instructions operate on. The call stack holds CallFrame objects that track function calls. The instruction pointer (ip) indexes into the bytecode array. Globals are stored in a hash map keyed by name. The heap is a flat vector of objects, with a free list for recycling slots after garbage collection.

The CallFrame is equally compact:

pub struct CallFrame {
    return_ip: usize,
    base_pointer: usize,
    locals: Vec<Value>,
    function_name: String,
}

When the VM calls a function, it pushes a new CallFrame with the return address (so it knows where to continue after the function returns), a base pointer (for stack unwinding), a vector of local variables, and the function name (for error messages and debug output).

Two hard limits protect the system: a maximum stack depth of 65,536 values and a maximum call depth of 1,024 frames. The stack limit prevents accidental infinite loops from consuming all memory. The call depth limit prevents unbounded recursion from blowing the call stack. Both are generous enough for any reasonable program and restrictive enough to catch bugs early.

---

The Execution Loop

The heart of the VM is a loop that fetches, decodes, and executes instructions:

impl VM {
    pub fn execute(&mut self, chunk: &Chunk) -> Result<Value, RuntimeError> {
        loop {
            let instruction = chunk.code[self.ip];

match instruction { OpCode::Halt => { return Ok(self.pop_or_none()); }

OpCode::LoadConst(idx) => { let value = chunk.constants[idx as usize].to_value(); self.push(value); }

OpCode::Add => { let b = self.pop(); let a = self.pop(); self.push(a.add(&b)?); }

OpCode::Jump(offset) => { self.ip = offset as usize; continue; // Skip ip increment }

OpCode::JumpIfFalse(offset) => { let condition = self.pop(); if !condition.is_truthy() { self.ip = offset as usize; continue; } }

// ... 100+ more opcodes }

self.ip += 1; } } } ```

The pattern is the same for every instruction: fetch the opcode at the current instruction pointer, match on it, perform the operation (usually involving the stack), and advance the instruction pointer. Jump instructions set ip directly and continue to skip the automatic increment.

This is a direct-threaded dispatch loop. In production VMs like CPython, this pattern is sometimes replaced with computed goto tables for better branch prediction. We stayed with match because Rust's compiler optimises it into a jump table when the variant count is large enough, and because clarity mattered more than squeezing out the last nanosecond.

One subtle detail: the continue statement after jump instructions. Without it, the VM would increment ip after setting it to the jump target, landing one instruction past where it should. This is a classic VM bug, and getting it wrong produces symptoms that look nothing like the actual cause -- values appear on the stack in the wrong order, functions return to the wrong address, conditionals take the wrong branch. We caught it in testing, but it cost us thirty minutes of debugging before we traced it to a missing continue.

---

Stack Operations in Detail

The stack is the nervous system of the VM. Every value produced by an expression passes through it. Every function argument, every return value, every intermediate result lives on the stack for exactly as long as it needs to.

The core operations are deceptively simple:

fn push(&mut self, value: Value) {
    if self.stack.len() >= MAX_STACK_SIZE {
        panic!("Stack overflow: exceeded {} values", MAX_STACK_SIZE);
    }
    self.stack.push(value);
}

fn pop(&mut self) -> Value { self.stack.pop().expect("Stack underflow") }

fn peek(&self) -> &Value { self.stack.last().expect("Stack underflow") } ```

But the bytecode requires more than just push and pop. Consider the Dup instruction, which duplicates the top of the stack. It exists because many operations need to use a value twice -- once for a comparison and once for an assignment, for example. Without Dup, the compiler would need to emit a load instruction for the same variable twice, which is both larger and slower.

Swap exchanges the top two values. Over copies the second-from-top value to the top. Dup2 duplicates the top two values. These instructions exist because they eliminate redundant loads in common patterns. A simple a = a + 1 benefits from Dup to avoid reloading a after the increment.

The VM also provides specialised load instructions for common constants: LoadInt0, LoadInt1, LoadIntN1, and LoadSmallInt. Loading the integer zero is so common (initialising counters, resetting state, checking for empty collections) that it deserves a single-byte instruction instead of the three-byte LoadConst followed by an index lookup.

---

Variables: Local and Global

FLIN has two scopes for variables: local (within a function) and global (at the module level). The VM handles them differently.

Global variables live in a HashMap. Loading a global means hashing the variable name and looking it up. Storing a global means inserting into the map. This is O(1) amortised but involves heap allocation for the key string.

Local variables live in the current CallFrame's locals vector. Loading a local means indexing into that vector by slot number. Storing a local means writing to that slot. This is O(1) with no allocation -- just an array index.

The compiler assigns each local variable a slot number at compile time. The first local in a function gets slot 0, the second gets slot 1, and so on. The slot number is encoded directly in the instruction:

LoadLocal 0    ; Load the first local variable
LoadLocal 1    ; Load the second local variable
StoreLocal 2   ; Store into the third local variable

For functions with more than 255 locals (rare, but possible in generated code), the VM provides LoadLocal16 and StoreLocal16 with 16-bit slot indices.

Two specialised instructions, IncrLocal and DecrLocal, increment or decrement a local variable in place without touching the stack. The pattern i = i + 1 is so common in loops that a dedicated instruction saves three stack operations per iteration.

---

Control Flow

Conditional execution and loops reduce to three jump instructions: Jump, JumpIfTrue, and JumpIfFalse. Each takes a target address as its operand.

Jump is unconditional -- it sets the instruction pointer and continues. JumpIfTrue and JumpIfFalse pop the top of the stack, test its truthiness, and jump (or not) based on the result.

Truthiness in FLIN follows simple rules: false and None are falsy. Everything else -- including the integer zero, the empty string, and the empty list -- is truthy. This diverges from Python (where zero is falsy) and JavaScript (where empty string is falsy), but it aligns with FLIN's philosophy that a value either exists or it does not. Zero is a valid count. An empty string is a valid string. Only the explicit absence of a value (None) and the explicit boolean false are falsy.

Function calls use the Call instruction, which pops the callee and pushes a new CallFrame:

LoadGlobal "add"   ; Push the function onto the stack
LoadInt 3          ; Push first argument
LoadInt 5          ; Push second argument
Call 2             ; Call with 2 arguments
; Result is now on the stack

The Return instruction pops the return value, restores the instruction pointer from the CallFrame, truncates the stack to the frame's base pointer, and pushes the return value. The function's entire stack frame disappears in one operation.

For far jumps (targets beyond 65,535), the JumpFar instruction uses a 32-bit address. In practice, programs rarely exceed 65K instructions, but the VM is prepared for it.

---

The Counter Example: From Source to Execution

To see all of this working together, consider the simplest FLIN program:

count = 0
count++

The compiler emits this bytecode:

LoadInt0           ; Push 0 onto the stack
StoreGlobal $count ; Pop 0, store as global "count"
LoadGlobal $count  ; Push current value of count (0)
Dup                ; Duplicate: stack is [0, 0]
Incr               ; Increment top: stack is [0, 1]
StoreGlobal $count ; Pop 1, store as global "count"
LoadGlobal $count  ; Push current value of count (1)
Halt               ; Stop, return top of stack (1)

Eight instructions. The VM executes them in a few microseconds. At the end, the global variable count holds the value 1, and the VM returns Value::Int(1).

This tiny example exercises five of the VM's subsystems: the constant pool (the name $count), global variable storage, stack operations (Dup), arithmetic (Incr), and the execution loop itself. If any of these are broken, the counter example fails. That is why it was the first integration test we wrote.

---

Stub Implementations: Planning for the Future

Session 010 implemented all stack, arithmetic, comparison, logic, variable, list, map, property, and control flow opcodes. But FLIN's instruction set also includes opcodes for entity operations, view operations, intent operations, temporal operations, and built-in functions.

We stubbed all of them. Every unimplemented opcode returned a RuntimeError::NotImplemented with the opcode name. This meant the VM could load and begin executing any valid FLIN bytecode without panicking -- it would simply report which operation was not yet available.

This was not placeholder code. It was a contract. Each stub declared the exact stack effect of its opcode (how many values it pops, how many it pushes) even before the implementation existed. When we came back to implement entity operations in Session 011 and view operations in later sessions, the signatures were already defined. The compiler had been emitting those opcodes from the start. All we had to do was fill in the implementations.

---

What 2,850 Lines of Rust Bought Us

At the end of Session 010, the VM was 56% complete. The value representation was done. The core execution loop was done. All stack, arithmetic, comparison, logic, variable, list, map, and control flow opcodes were implemented. Thirty-two new tests passed, bringing the total to 251.

More importantly, the architecture was locked in. The Value enum would not change. The VM struct would grow (adding an entity store, a HOF stack, a view buffer), but its fundamental shape -- an operand stack, a call stack, a heap, and an instruction pointer -- was set.

When you build a virtual machine, you are building the ground floor of a skyscraper. Every other feature -- garbage collection, closures, views, reactivity, hot module reload -- stands on this foundation. If the foundation is wrong, everything above it is wrong. If the foundation is solid, everything above it has a chance of being right.

The foundation was solid. Thirty-two tests said so. And the Rust compiler, which had verified every type, every borrow, and every lifetime in those 2,850 lines, was the thirty-third.

---

This is Part 21 of the "How We Built FLIN" series, documenting how a CEO in Abidjan and an AI CTO built a programming language from scratch.

Next up: [22] Memory Management and Garbage Collection -- how we built a mark-and-sweep garbage collector that keeps FLIN's heap clean without pausing the world.

Share this article:

Responses

Write a response
0/2000
Loading responses...

Related Articles