#204 -- How We Work: A Typical CEO + AI CTO Session

Three hundred and one sessions produced a programming language. But what does a single session actually look like? How does a human CEO direct an AI CTO through the construction of compiler features, database engines, and security frameworks? What is the workflow that made 186,000 lines of Rust possible in 42 days?

This article pulls back the curtain on the methodology. Not the code, not the features -- the process. How Juste and Claude work together, session by session, to build FLIN.

The Session Structure

Every FLIN development session follows a consistent structure. Not because a process document mandates it, but because the structure evolved organically over 301 iterations into the most efficient pattern.

Phase 1: Context Loading (2-5 minutes)

The AI has no memory between sessions. Every session begins with context: what was built in the previous session, what the current state of the codebase is, what the goals for this session are.

After Session 254's hard-learned lesson, this phase includes mandatory start-of-session instructions that establish FLIN's architectural constraints. The CEO pastes a brief reminder document that includes:

FLIN's scope model (flat, not component-local)
Current test count and build status
The specific tracking file for the feature being worked on
Patterns that work (with code examples)
Patterns that do not work (with explanations)

This context loading is the most important two minutes of the session. Every minute spent loading context saves ten minutes of debugging wrong assumptions.

Phase 2: Goal Definition (1-2 minutes)

The CEO states what this session should accomplish. Goals are concrete and scoped:

Session goals (example from a typical session):

1. Implement storage backend trait with local filesystem backend
2. Add unit tests for the storage trait
3. Update tracking document
4. Commit with descriptive message

NOT in scope:
- Cloud storage backends (next session)
- File upload modifications
- Performance optimization

Scoping is critical. An AI given a vague goal ("improve the file system") will produce unpredictable results. An AI given a specific goal ("implement the StorageBackend trait with store, retrieve, delete, exists, and metadata methods, plus a LocalStorageBackend implementation") will produce exactly what is needed.

Phase 3: Implementation (30-45 minutes)

This is where the code happens. The CEO describes what to build; the AI produces Rust code. The cycle is typically:

CEO describes a function, struct, or feature
AI produces the implementation
cargo check verifies compilation
cargo test verifies behavior
CEO reviews the output and directs adjustments
Repeat until the feature is complete

The feedback loop is measured in minutes. The AI produces a function; the compiler validates it in seconds; the CEO decides whether it meets requirements. There is no code review queue, no pull request process, no waiting for CI. The cycle time from "describe" to "validated" is 3-5 minutes.

rust// Typical implementation cycle:

// CEO: "Implement a download_grant function that generates
//        a signed URL with expiration"

// AI produces:
pub fn download_grant(
    key: &str,
    config: &GrantConfig,
) -> Result<DownloadGrant, StorageError> {
    let expires_at = SystemTime::now()
        .duration_since(UNIX_EPOCH)?
        .as_secs() + config.expires_in;

    let signature = hmac_sha256(
        &format!("{}:{}", key, expires_at),
        &config.signing_key,
    );

    Ok(DownloadGrant {
        url: format!(
            "/_flin/grants/{}?expires={}&sig={}",
            key, expires_at, hex::encode(&signature)
        ),
        expires_at,
    })
}

// cargo check: OK
// cargo test: 3 new tests pass
// CEO: "Add content-disposition header support"
// AI adjusts, cycle repeats

Phase 4: Testing and Verification (5-10 minutes)

Every session ends with a test run. cargo test --lib for unit tests, cargo test --test integration_e2e for integration tests. The test count is recorded. Zero failures is the standard; any failure is fixed before the session closes.

Phase 5: Documentation and Commit (5 minutes)

The session log is written. The tracking document is updated. The commit is made with a descriptive message that includes the session number, the features implemented, and the test count.

feat: StorageBackend trait + LocalStorage implementation (Session 214)

- Defined StorageBackend trait (store, retrieve, delete, exists, metadata)
- Implemented LocalStorageBackend with content-addressable storage
- Added 12 unit tests for storage operations
- Updated FM-3 tracking: 4/16 -> 8/16

Tests: 2,997 lib + 623 integration = 3,620 total

The Division of Labor

The CEO and AI CTO have distinct, complementary roles. Understanding these roles is essential to understanding why the model works.

The CEO (Juste) handles:

Product vision. What should FLIN be? Who is it for? What problems does it solve? These questions cannot be answered by code analysis -- they require understanding of markets, users, and competition.

Architectural decisions. Should FlinDB use B-trees or LSM trees? Should the temporal system use bitemporal or valid-time-only semantics? Should components have local state or flat scope? These decisions have far-reaching consequences and require judgment informed by trade-off analysis.

Prioritization. Should we finish the temporal system (5% remaining) or start security (0% complete)? Session 088's decision to move on from the temporal system at 95% was a prioritization call that only a human with product sense could make.

Quality control. Does this implementation match the design intent? Is the API ergonomic? Would a developer in Dakar find this syntax intuitive? The AI can produce correct code, but "correct" and "good" are different standards.

Strategic communication. Writing blog posts, documenting the journey, representing the project publicly. The AI contributes to content creation, but the human provides the narrative and the authenticity.

The AI CTO (Claude) handles:

Code production. Writing Rust implementations of features described by the CEO. The AI can produce hundreds of lines of correct, idiomatic Rust per minute.

Algorithm implementation. Pratt parsing, Hindley-Milner inference, AES-256-GCM encryption, BM25 scoring, HNSW indexing. These are well-documented algorithms that the AI implements accurately from its training data.

Test generation. Writing unit and integration tests that exercise the code from multiple angles. The AI generates comprehensive test suites because it can enumerate edge cases systematically.

Documentation. Session logs, API documentation, code comments. The AI produces structured documentation that captures the implementation details.

Bug analysis. When a test fails, the AI can trace through the code to identify the cause. Its ability to hold the entire codebase in context makes it effective at cross-module debugging.

The Tools

The toolchain is deliberately minimal:

Development Environment:
    Hardware:     MacBook
    Editor:       Terminal (iTerm2) -- no IDE
    Language:     Rust (via rustup)
    AI:           Claude (via API and CLI)
    VCS:          Git
    Build:        cargo build / cargo test / cargo clippy
    Database:     FlinDB (embedded, no external DB needed)
    CI/CD:        None (tests run locally)
    Project Mgmt: Markdown files in the repository

There is no IDE with FLIN-specific extensions (yet -- Session 252 built a VSCode extension, but development does not depend on it). There is no Docker for the development database (FlinDB is embedded). There is no CI/CD pipeline (tests run locally in under 30 seconds). There is no project management tool beyond Markdown tracking files.

This minimalism is a choice. Every tool in a development workflow has a maintenance cost. A CI/CD pipeline needs to be configured, updated, and debugged when it fails. An IDE extension needs to be kept in sync with language changes. A project management tool needs to be populated with tasks and kept current. When your team is two entities and your budget is $200/month, every tool must justify its existence.

The three commands that gate every session are:

bashcargo check          # Does it compile?
cargo test           # Does it behave correctly?
cargo clippy -D warnings  # Does it follow Rust idioms?

If all three pass, the code is ready. The Rust compiler is the third team member -- it catches type errors, ownership violations, and common mistakes before the code runs. This three-way collaboration (human direction, AI implementation, compiler validation) is the engine of the CEO + AI CTO model.

Decision-Making Patterns

Over 301 sessions, several decision-making patterns emerged:

"Build it right or build it fast?" Almost always "build it right." FLIN's test count grew monotonically from Session 001 to Session 301. Every feature was tested before the next began. The AI's speed means that "right" and "fast" are not as much in tension as they are in traditional development. Writing 12 unit tests for a storage backend takes the AI 3 minutes. Skipping tests saves 3 minutes and costs hours when a regression appears later.

"When do we stop?" When the feature is production-ready, not when it is perfect. Session 088's decision to leave the temporal system at 95% is the canonical example. The remaining 5% (retention policies) was useful but not essential. Other features had higher ROI. Perfectionism is the enemy of progress.

"What do we build next?" The roadmap drives the sequence, but the CEO adjusts based on what the previous session revealed. If Session 212's audit shows that file management is further along than expected, the next session can skip planned work and move to the next uncovered area. Agility is not a methodology -- it is a response to information.

"When do we stop and analyze?" After three failures. This rule, codified after Session 254, prevents the AI from cycling through variations of the same wrong approach. Three failures on the same problem means the problem is not implementation -- it is understanding. Stop implementing. Start analyzing.

The Three-Failure Rule (established Session 254):

Attempt 1: Try the most obvious approach
Attempt 2: Try a variation if attempt 1 fails
Attempt 3: Try an alternative approach

If all three fail:
    STOP implementing
    READ existing examples
    ANALYZE the generated output
    UNDERSTAND the root cause
    THEN implement with correct understanding

What Makes It Work

The CEO + AI CTO model works for FLIN because of four factors that may not generalize to all projects:

Factor 1: Rust's compiler is a reliable validator. When the AI produces code, cargo check immediately verifies that it compiles. Type errors, ownership violations, and API misuses are caught in seconds. This fast, reliable validation loop is the foundation of the model's productivity. In a dynamically typed language, the AI could produce code that looks correct but fails at runtime in subtle ways.

Factor 2: FLIN is a self-contained project. There are no external service dependencies to configure, no API keys to manage (beyond the AI gateway), no third-party libraries to keep updated. The project's self-contained nature means that the AI can reason about the entire codebase without gaps.

Factor 3: The CEO has clear product vision. The FLIN specification was written before the first line of code. The design decisions -- temporal database, embedded UI, zero dependencies -- were made before implementation began. The AI never has to guess what the product should be. It implements a vision that is already defined.

Factor 4: The work is decomposable. Building a programming language is complex, but it decomposes into well-defined modules: lexer, parser, type checker, code generator, VM, database, server, security, UI. Each module has clear interfaces and can be worked on independently. This decomposability enables the session-based approach -- each session tackles one module or feature.

These four factors -- reliable compiler validation, self-contained project, clear vision, and decomposable architecture -- create an environment where the CEO + AI CTO model thrives. Other projects may lack one or more of these factors, which would change the calculus.

The Session Cadence

The average across 301 sessions is 7.2 sessions per day. But averages obscure the reality. Some days had 18 sessions (January 15, the security sprint). Some days had 1 or 2 sessions (late in the project, during polish phases). The cadence was driven by the work, not by a schedule.

A session is not a fixed time block. It is a unit of focused work with a clear goal, a clear deliverable, and a clear end state. Some sessions lasted 30 minutes. Some lasted 4 hours. The duration depends on the complexity of the goal, not on a timer.

This flexibility is another advantage of the model. There is no standup meeting at 9 AM. No sprint planning on Monday. No retrospective on Friday. The work happens when the CEO is ready to direct it, at whatever pace the work demands.

Three hundred and one sessions. Forty-two days. A methodology that is simple enough to describe in one article and robust enough to produce a programming language. The process is the product.

This is Part 204 of the "How We Built FLIN" series, documenting how a CEO in Abidjan and an AI CTO built a programming language from scratch.

Series Navigation: - [203] 9 Agents Running in Parallel: The i18n Sprint - [204] How We Work: A Typical CEO + AI CTO Session (you are here) - [205] 42 Days, One Language, Zero Excuses

#204 -- How We Work: A Typical CEO + AI CTO Session

The Session Structure

The Division of Labor

The Tools

Decision-Making Patterns

What Makes It Work

The Session Cadence

Responses

Related Articles

Step Zero Wasn’t Enough: How Validating A Constructor But Not The Runtime Took Down Every Déblo Voice Session The Hour We Shipped Real-Time Camera Streaming

The Em-Dash That Killed Production: How One Marketing Tagline In An HTTP Header Took Down Déblo’s Chat For 24 Hours

Six Hours From Empty Page to Apple Review — How We Submitted Déblo to the App Store, Live