Real-Time File Watching and Diff Computation in Rust

Most developer tools interact with code at rest -- files on disk, commits in a repository, artifacts in a registry. 0diff interacts with code in motion. It watches files as they change, computes diffs in real time, filters out noise, attributes the change to a human or AI agent, and records everything. The entire pipeline runs in a synchronous event loop with no async runtime, no background threads beyond the OS file watcher, and no external services.

This article is a deep dive into how that pipeline works. We will walk through the watcher module (266 lines), the differ module (176 lines), the filter module (184 lines), and the configuration system (456 lines) that ties them together. Every code snippet is from the actual 0diff codebase. Every design decision has a reason.

The Event Loop: No Async, No Problem

The heart of 0diff is a synchronous event loop in watcher.rs. When you run 0diff watch, here is what happens:

The configuration is loaded from .0diff.toml.
All watched directories are scanned, and every matching file's contents are cached in a HashMap<PathBuf, String>.
An OS-level file watcher is registered using the notify crate with notify-debouncer-mini for event coalescing.
The main thread enters a loop, receiving filesystem events through a standard mpsc channel.

rustpub fn run(config: Config, format: OutputFormat) -> Result<(), Box<dyn std::error::Error>> {
    let shutdown = Arc::new(AtomicBool::new(false));
    let shutdown_flag = shutdown.clone();
    ctrlc::set_handler(move || {
        shutdown_flag.store(true, Ordering::SeqCst);
    })?;

    let mut file_cache: HashMap<PathBuf, String> = HashMap::new();
    // ... seed cache from watched directories

    let (tx, rx) = std::sync::mpsc::channel();
    let mut debouncer = new_debouncer(Duration::from_millis(config.watch.debounce_ms), tx)?;
    // ... register watch paths

    while !shutdown.load(Ordering::SeqCst) {
        match rx.recv_timeout(Duration::from_millis(250)) {
            Ok(Ok(events)) => { /* handle each event */ }
            Ok(Err(error)) => { /* log error */ }
            Err(RecvTimeoutError::Timeout) => { /* check shutdown */ }
            Err(RecvTimeoutError::Disconnected) => { break; }
        }
    }

    let _ = history.rotate(config.history.max_size_mb, config.history.max_days);
    Ok(())
}

There are several deliberate choices here worth examining.

Why synchronous mpsc instead of tokio::sync::mpsc or async_channel? Because 0diff does not need async. The event loop has exactly one source of events (the file watcher) and one consumer (the main thread). There is no concurrent I/O, no network calls, no fan-out/fan-in. A synchronous channel with a 250ms receive timeout gives us everything we need: responsive event handling, periodic shutdown checks, and zero runtime overhead.

Why recv_timeout(250ms) instead of blocking recv()? The timeout serves two purposes. First, it lets us check the AtomicBool shutdown flag every 250 milliseconds, ensuring a responsive exit when the user presses Ctrl+C. Second, it creates a natural heartbeat that prevents the process from appearing stuck to process managers or monitoring tools.

Why AtomicBool for shutdown instead of a channel? The ctrlc crate's handler runs in a signal context where allocations and complex operations are unsafe. An atomic store is one of the few operations guaranteed to be safe in a signal handler. The main loop reads it on every timeout cycle, giving us clean shutdown with history rotation before exit.

Why notify-debouncer-mini? Raw filesystem events are noisy. A single file save in most editors triggers multiple events: a write to a temporary file, a rename, a metadata update, sometimes a delete-and-recreate. The debouncer coalesces these into a single event per file within a configurable window (default 500ms). This prevents 0diff from computing the same diff three times for one save operation.

The File Cache

When 0diff starts watching, it reads the current contents of every tracked file into an in-memory HashMap<PathBuf, String>. This cache serves as the baseline for diff computation. When a file changes, 0diff reads the new contents from disk, diffs them against the cached version, and then updates the cache.

This means 0diff computes diffs against the last observed state, not against the git HEAD or any other reference point. This is intentional. Git diffs tell you what changed since the last commit. 0diff diffs tell you what changed since the last time 0diff saw the file. In a multi-agent environment where agents are making rapid changes between commits, this real-time view is far more useful.

The trade-off is memory usage. The cache holds the full text of every watched file. For a typical project with a few hundred source files, this is negligible -- perhaps 10-50MB. For a monorepo with millions of lines, you would want to configure the watch paths carefully in .0diff.toml. The configuration system supports this with extension filters, path prefixes, and glob-based ignore patterns.

The Diff Engine

The differ module is 176 lines of Rust built on the similar crate. similar implements the Myers diff algorithm, the same algorithm used by git diff. We chose it over alternatives like diffy or imara-diff because it provides clean access to grouped operations with context lines, which is exactly what we need for producing readable, hunk-based diffs.

rustpub fn compute_diff(old: &str, new: &str, file_path: &str) -> FileDiff {
    let diff = TextDiff::from_lines(old, new);
    let mut hunks = Vec::new();
    let mut total_additions = 0;
    let mut total_deletions = 0;

    for group in diff.grouped_ops(3) {
        let mut lines = Vec::new();
        let mut old_start = 0;
        let mut new_start = 0;

        for op in &group {
            for change in diff.iter_changes(op) {
                let text = change.value().to_string();
                match change.tag() {
                    ChangeTag::Equal => lines.push(DiffLine::Context(text)),
                    ChangeTag::Insert => {
                        lines.push(DiffLine::Add(text));
                        total_additions += 1;
                    }
                    ChangeTag::Delete => {
                        lines.push(DiffLine::Delete(text));
                        total_deletions += 1;
                    }
                }
            }
        }

        hunks.push(DiffHunk {
            old_start,
            old_count: /* computed from ops */,
            new_start,
            new_count: /* computed from ops */,
            lines,
        });
    }

    FileDiff {
        file_path: file_path.to_string(),
        hunks,
        additions: total_additions,
        deletions: total_deletions,
    }
}

The grouped_ops(3) call is significant. It groups consecutive diff operations and includes 3 lines of surrounding context for each group, matching the default behaviour of git diff. This means 0diff's output is immediately familiar to any developer who has read a unified diff.

The output is a FileDiff struct containing a vector of DiffHunks, each with precise line range information (old_start, old_count, new_start, new_count) and a vector of DiffLine entries. This structured representation is what allows the rest of the pipeline -- filtering, display, JSON serialization -- to work with diffs as data rather than parsing text.

Whitespace Filtering

One of the most common sources of diff noise is whitespace changes. An editor reformats indentation. A linter adjusts trailing spaces. A developer switches between tabs and spaces. These changes produce diffs that obscure the meaningful modifications.

The filter module (184 lines) addresses this with a targeted approach. Rather than ignoring all whitespace in diff computation (which would hide legitimate formatting changes), it post-processes the diff hunks and removes only those where every change is purely whitespace:

rustfn is_whitespace_only_hunk(hunk: &DiffHunk) -> bool {
    let adds: Vec<&str> = hunk.lines.iter()
        .filter_map(|l| match l {
            DiffLine::Add(s) => Some(s.as_str()),
            _ => None,
        })
        .collect();

    let dels: Vec<&str> = hunk.lines.iter()
        .filter_map(|l| match l {
            DiffLine::Delete(s) => Some(s.as_str()),
            _ => None,
        })
        .collect();

    // If counts don't match, it's a real structural change
    if adds.len() != dels.len() {
        return false;
    }

    // Empty hunks with only context lines are not whitespace-only
    if adds.is_empty() {
        return false;
    }

    // Every add/delete pair must differ only in whitespace
    adds.iter().zip(dels.iter()).all(|(a, d)| {
        normalize_whitespace(a) == normalize_whitespace(d)
    })
}

The logic is careful about edge cases. If the number of additions does not match the number of deletions, it is not a whitespace change -- lines were added or removed, not just reformatted. If there are no additions or deletions at all (just context lines), it is not a whitespace-only hunk. Only when every added line has a corresponding deleted line and they differ only in whitespace (leading, trailing, and internal runs collapsed) does the filter remove the hunk.

The normalize_whitespace function trims leading and trailing whitespace, then collapses all internal whitespace runs into a single space. This catches the common cases: re-indentation, tab-to-space conversion, trailing whitespace removal, and alignment changes.

This filtering is controlled by the filter.ignore_whitespace configuration option. When enabled (the default), whitespace-only hunks are stripped before the change is recorded. The developer still sees meaningful changes clearly, while automated formatting noise is suppressed.

The Full Pipeline

When a file changes on disk, the watcher module orchestrates the entire pipeline. Here is the flow, simplified but faithful to the actual implementation:

rustfn handle_file_change(
    path: &Path,
    relative: &Path,
    cache: &mut HashMap<PathBuf, String>,
    config: &Config,
    git: &GitInfo,
    detector: &AgentDetector,
    history: &mut HistoryStore,
    format: &OutputFormat,
) -> Result<(), Box<dyn std::error::Error>> {
    // 1. Read the new file contents
    let new_contents = std::fs::read_to_string(path)?;
    let old_contents = cache.get(path).cloned().unwrap_or_default();

    // 2. Compute the diff against the cached version
    let rel_str = relative.to_string_lossy();
    let diff = differ::compute_diff(&old_contents, &new_contents, &rel_str);

    // 3. Apply whitespace filtering if configured
    let diff = if config.filter.ignore_whitespace {
        filter::filter_whitespace_changes(diff)
    } else {
        diff
    };

    // 4. Check if the change meets the minimum threshold
    if diff.hunks.is_empty()
        || (diff.additions + diff.deletions) < config.filter.min_lines_changed
    {
        cache.insert(path.to_path_buf(), new_contents);
        return Ok(());
    }

    // 5. Get git metadata (author, branch)
    let author = git.get_author();
    let branch = git.get_branch();

    // 6. Detect AI agent
    let commit_info = git.get_last_commit();
    let agent = detector.tag_for_entry(commit_info.as_ref());

    // 7. Create and record the history entry
    let entry = HistoryEntry {
        timestamp: Utc::now().to_rfc3339(),
        file: rel_str.to_string(),
        additions: diff.additions,
        deletions: diff.deletions,
        author,
        branch,
        agent,
        summary: format!("{} additions, {} deletions", diff.additions, diff.deletions),
    };

    history.append(&entry)?;

    // 8. Display to terminal or emit JSON
    display::print_change(&entry, &diff, format);

    // 9. Update the cache
    cache.insert(path.to_path_buf(), new_contents);

    Ok(())
}

Steps 4 is the noise gate. The min_lines_changed threshold (default: 1) prevents 0diff from recording trivial changes like adding a single newline. Combined with the whitespace filter, this means the history log contains only meaningful modifications.

File deletions follow a parallel path. When the watcher detects a deletion event, it computes the diff as a full removal (every line in the cached version becomes a deletion), records it to history, and removes the file from the cache. This ensures that file deletions are tracked with the same fidelity as modifications.

The Configuration System

The config module is the largest single module at 456 lines, and for good reason. A file watcher that cannot be configured is useless -- every project has different file types, different directory structures, different noise sources.

0diff uses TOML for configuration, stored in .0diff.toml at the project root. The configuration has five sections:

[watch] -- Which directories to watch, which file extensions to track, which patterns to ignore, and the debounce interval.
[filter] -- Whether to ignore whitespace changes and the minimum line change threshold.
[git] -- Whether to extract git metadata and how.
[history] -- Where to store the history file, maximum size for rotation, and maximum age in days.
[agents] -- Custom agent detection patterns beyond the built-in ones.

The should_watch() function is the gatekeeper. Every filesystem event passes through it before any diff computation occurs:

rustpub fn should_watch(&self, path: &Path) -> bool {
    // 1. Check extension is in watch.extensions
    let ext = path.extension()
        .and_then(|e| e.to_str())
        .unwrap_or("");
    if !self.watch.extensions.is_empty()
        && !self.watch.extensions.contains(&ext.to_string())
    {
        return false;
    }

    // 2. Check path starts with at least one watch.paths prefix
    let in_watch_path = self.watch.paths.iter().any(|p| {
        path.starts_with(p)
    });
    if !in_watch_path {
        return false;
    }

    // 3. Check path doesn't match any watch.ignore glob pattern
    for pattern in &self.watch.ignore {
        if glob_match(pattern, path) {
            return false;
        }
    }

    true
}

The three-step check is ordered by cost. Extension checking is a string comparison -- essentially free. Path prefix checking is slightly more expensive but still fast. Glob pattern matching is the most expensive operation and is only reached for files that pass the first two checks.

The default configuration ignores common noise directories (target/, node_modules/, .git/, build/, dist/) and common non-source extensions (images, binaries, lock files). A new user can run 0diff init and start watching immediately without any manual configuration. The generated .0diff.toml includes comments explaining every option, so customization is straightforward.

The config module includes 7 tests covering TOML parsing, default values, extension filtering, path prefix matching, and glob ignore patterns. These tests were written by agent-config during the initial build session and have caught several edge cases in subsequent development -- particularly around path separator handling on different operating systems.

The Test Suite

0diff has 44 tests across all modules. The distribution reflects where complexity lives:

config: 7 tests -- TOML parsing, watch rule evaluation, default values
differ: 8 tests -- empty diffs, additions only, deletions only, mixed changes, context lines
filter: 6 tests -- whitespace-only hunks, mixed hunks, mismatched add/delete counts, empty hunks
git: 9 tests -- commit parsing, author extraction, agent detection from commit messages, environment variables, TTY detection
history: 8 tests -- append, query by author, query by agent, rotation by size, rotation by age, JSON-lines format validation
watcher: 3 tests -- event handling, cache update, shutdown
display: 3 tests -- terminal output format, JSON output format, summary generation

The tests are unit tests, not integration tests. Each module is tested in isolation with constructed inputs. This was a practical decision for the initial build -- the 45-minute session did not have time for integration test infrastructure. The unit tests cover the important logic paths, and the modules' clean interfaces (structured data in, structured data out) mean that integration issues are rare.

Performance Characteristics

0diff is designed to be invisible. It should not slow down your development workflow or consume noticeable system resources.

Startup time: Under 50ms on a typical project. The main cost is seeding the file cache, which requires reading every watched file. For a project with 500 source files averaging 200 lines each, this is about 10MB of I/O -- trivial on any modern system.

Event handling latency: Sub-millisecond for the diff computation on typical file changes (under 1000 lines). The similar crate's Myers implementation is O(ND) where N is the total number of lines and D is the edit distance. For the common case of small edits to medium-sized files, this completes in microseconds.

Memory usage: Proportional to the total size of watched files (for the cache) plus the in-memory portion of the debouncer's event buffer. Typically 20-100MB for a medium-sized project.

Disk usage: The JSON-lines history file grows at roughly 200-500 bytes per recorded change. At 100 changes per day, that is about 15KB/day or 5MB/year. The rotation system ensures the file never exceeds max_size_mb (default: 50MB) or max_days (default: 90 days).

The 2MB release binary includes everything -- no runtime dependencies beyond the OS-level filesystem notification API (inotify on Linux, FSEvents on macOS, ReadDirectoryChangesW on Windows) and the git command-line tool.

What We Learned

Building a file watcher taught us several things that are not obvious from documentation:

Filesystem events are unreliable. Different operating systems, different filesystems, and different editors produce different event sequences for the same logical operation. The debouncer handles most of this, but we still had to handle cases where we receive a modify event for a file that does not exist (because it was deleted between the event and our read) or a create event for a file that already exists in our cache (because the editor did a delete-recreate instead of a modify-in-place).

Cache invalidation is the real problem. The file cache must stay in sync with disk. If a file changes while we are processing another event, we could compute a diff against stale data. The debouncer helps by coalescing rapid changes, and the cache-update-after-processing pattern ensures we always record the transition from the last known state to the current state.

Whitespace filtering is harder than it looks. The naive approach (strip all whitespace and compare) destroys too much information. A line that changes from if (x) to if ( x ) is a whitespace change. A line that changes from return 0 to return 1 is not. But a line that changes from return 0 to return 0 is. The paired-comparison approach -- matching each addition with its corresponding deletion and comparing normalized forms -- handles all these cases correctly.

Configuration is a feature, not an afterthought. The config module is the largest module for a reason. A file watcher without proper ignore patterns will drown in noise from node_modules, build artifacts, and generated files. A diff recorder without a minimum change threshold will fill the history with single-character edits. Getting the defaults right and making customization easy is as important as the core functionality.

Series: How We Built 0diff.dev

This article is part of a four-part series on building 0diff:

Why We Built a Code Change Tracker for the AI Agent Era -- The problem, the solution, and the 45-minute build session
Real-Time File Watching and Diff Computation in Rust -- You are here
Detecting AI Agents in Your Codebase -- The agent detection system in detail
From 5 Agents to Production: Shipping 0diff in 20 Minutes -- The parallel agent workflow that built it all

Real-Time File Watching and Diff Computation in Rust

The Event Loop: No Async, No Problem

The File Cache

The Diff Engine

Whitespace Filtering

The Full Pipeline

The Configuration System

The Test Suite

Performance Characteristics

What We Learned

Series: How We Built 0diff.dev

Responses

Related Articles

Step Zero Wasn’t Enough: How Validating A Constructor But Not The Runtime Took Down Every Déblo Voice Session The Hour We Shipped Real-Time Camera Streaming

The Em-Dash That Killed Production: How One Marketing Tagline In An HTTP Header Took Down Déblo’s Chat For 24 Hours

Six Hours From Empty Page to Apple Review — How We Submitted Déblo to the App Store, Live