#187 -- Search Result Caching

Search is expensive. A hybrid search in FLIN involves embedding generation (converting the query text to a vector), HNSW index traversal (finding approximate nearest neighbors in high-dimensional space), BM25 scoring (computing term frequency and inverse document frequency), Reciprocal Rank Fusion (merging and re-ranking two result sets), and finally result serialization. Each of these steps has a measurable cost. For a single query, the total latency is acceptable -- 20 to 40 milliseconds. For the same query executed 100 times in a minute, the cumulative cost is waste.

Session 226 introduced FLIN's search result caching system: an LRU-based cache with TTL expiration and entity-aware invalidation. The system is built into the FLIN runtime, requiring zero configuration from the developer, and eliminates redundant search computations for repeated queries.

The Cache Architecture

The search cache stores results for all seven search function types that FLIN supports: basic search, multi-field search, search with snippets, deduplicated search, hybrid search, hybrid deduplicated search, and hybrid search with snippets. Each function type has its own storage within the cache, avoiding boxing overhead and enabling type-safe retrieval.

rustpub struct SearchCache {
    basic: HashMap<u64, SearchCacheEntry<Vec<SearchResult>>>,
    multi_field: HashMap<u64, SearchCacheEntry<Vec<SearchResult>>>,
    snippets: HashMap<u64, SearchCacheEntry<Vec<SnippetResult>>>,
    deduplicated: HashMap<u64, SearchCacheEntry<Vec<SearchResult>>>,
    hybrid: HashMap<u64, SearchCacheEntry<Vec<HybridResult>>>,
    hybrid_dedup: HashMap<u64, SearchCacheEntry<Vec<HybridResult>>>,
    hybrid_snippets: HashMap<u64, SearchCacheEntry<Vec<HybridSnippetResult>>>,
    config: SearchCacheConfig,
    stats: SearchCacheStats,
}

pub struct SearchCacheConfig {
    pub max_entries: usize,     // Default: 1,000
    pub default_ttl: Duration,  // Default: 300 seconds
    pub enabled: bool,          // Default: true
}

The per-type storage is a deliberate design choice. An alternative approach would be a single HashMap<u64, Box<dyn Any>> that stores any result type. This would simplify the code but introduce two costs: boxing every cache entry (heap allocation on insert, downcast on retrieval) and losing compile-time type safety. With per-type storage, inserting a Vec<SearchResult> into the basic cache and retrieving it requires no boxing, no downcasting, and no runtime type checks.

Cache Keys

The cache key is a hash of the query string plus all search parameters. Two searches with the same query but different parameters (different entity type, different field, different limit) produce different cache keys:

rustpub struct SearchCacheKey {
    query: String,
    entity_type: String,
    field: String,
    limit: usize,
    additional: Option<AdditionalParams>,
}

#[derive(Hash)]
pub enum AdditionalParams {
    MultiField(Vec<String>),       // Sorted field list
    Hybrid { alpha: HashableF32 }, // Semantic weight
    Deduplicated { threshold: HashableF32 },
}

// HashableF32: wrapper for f32 that implements Hash via bit representation
#[derive(Clone, Copy)]
pub struct HashableF32(f32);

impl Hash for HashableF32 {
    fn hash<H: Hasher>(&self, state: &mut H) {
        self.0.to_bits().hash(state);
    }
}

impl Eq for HashableF32 {}

impl PartialEq for HashableF32 {
    fn eq(&self, other: &Self) -> bool {
        self.0.to_bits() == other.0.to_bits()
    }
}

The HashableF32 wrapper deserves explanation. Rust's f32 does not implement Hash because floating-point equality is complicated (NaN != NaN, +0.0 == -0.0). For cache keys, we use bit-level comparison: two floats are the same cache key if and only if their IEEE 754 bit representations are identical. This means +0.0 and -0.0 produce different cache keys, which is correct behavior for a cache -- different inputs should not share cache entries.

For multi-field search, the field list is sorted before hashing. This ensures that search("query", fields: ["title", "content"]) and search("query", fields: ["content", "title"]) produce the same cache key, because field order does not affect search results.

Cache Entries and TTL

Each cache entry stores the results, a creation timestamp, and an access timestamp for LRU eviction:

rustpub struct SearchCacheEntry<T> {
    results: T,
    created_at: Instant,
    last_accessed: Instant,
    ttl: Duration,
    hit_count: u64,
}

impl<T> SearchCacheEntry<T> {
    pub fn is_expired(&self) -> bool {
        self.created_at.elapsed() > self.ttl
    }

    pub fn touch(&mut self) {
        self.last_accessed = Instant::now();
        self.hit_count += 1;
    }
}

The default TTL is 300 seconds (5 minutes). After 5 minutes, a cache entry is considered stale and will be evicted on the next access. The TTL is configurable per-cache and can be adjusted based on how frequently the underlying data changes.

Why 5 minutes? It is a compromise between freshness and performance. For most search use cases -- autocomplete, faceted browsing, repeated queries from the same session -- 5 minutes of staleness is acceptable. When a document is created or modified, the entity-aware invalidation system (described below) immediately clears relevant cache entries, so the 5-minute TTL is a safety net, not the primary invalidation mechanism.

LRU Eviction

When the cache reaches its maximum entry count (default: 1,000 per type), the least recently used entry is evicted to make room for the new entry:

rustimpl SearchCache {
    fn evict_lru_from<T>(
        cache: &mut HashMap<u64, SearchCacheEntry<T>>,
        stats: &mut SearchCacheStats,
    ) {
        // First pass: remove expired entries
        let expired_keys: Vec<u64> = cache
            .iter()
            .filter(|(_, entry)| entry.is_expired())
            .map(|(key, _)| *key)
            .collect();

        for key in &expired_keys {
            cache.remove(key);
            stats.evictions += 1;
        }

        // If still at capacity, remove least recently accessed
        if cache.len() >= 1000 {
            let lru_key = cache
                .iter()
                .min_by_key(|(_, entry)| entry.last_accessed)
                .map(|(key, _)| *key);

            if let Some(key) = lru_key {
                cache.remove(&key);
                stats.evictions += 1;
            }
        }
    }
}

The eviction process first removes expired entries (free space without losing valid data) and then falls back to LRU eviction if the cache is still full. This two-phase approach maximizes cache utilization: expired entries do not count against the capacity limit.

Entity-Aware Invalidation

The most important feature of the search cache is not what it caches but when it invalidates. When a document is created, updated, or deleted, the search results that reference that document's entity type become stale. The cache must be invalidated to prevent serving outdated results.

rustimpl SearchCache {
    pub fn invalidate_entity(&mut self, entity_type: &str) {
        // Remove all entries where the entity_type matches
        self.basic.retain(|_, entry| {
            // We cannot check entity_type from the entry directly,
            // so we use the key's embedded entity type
            true // Simplified -- actual implementation uses key metadata
        });

        // More efficient: store entity_type in the cache key metadata
        self.basic_entity_index
            .get(entity_type)
            .map(|keys| {
                for key in keys {
                    self.basic.remove(key);
                }
            });

        // Repeat for all cache types...
        self.stats.invalidations += 1;
    }
}

The invalidation is triggered automatically by the entity save, update, and delete operations in the VM. When a Document entity is saved, all search cache entries for entity type "Document" are cleared. This is an O(n) scan of the entity index, executed only when data changes -- not on every search.

The design decision to invalidate by entity type rather than by individual document is intentional. Invalidating a single document's cache entries would require tracking which cache entries contain which documents, which adds significant memory overhead and complexity. Since entity modifications are relatively infrequent compared to search queries, clearing all entries for the entity type is a reasonable trade-off.

The Wrapper Functions

FLIN's search module exposes 14 wrapper functions that integrate caching with the existing search API. Seven functions add caching to the base search functions, and seven combine caching with analytics:

rustpub fn search_documents_hybrid_cached_with_analytics(
    store: &VectorStore,
    bm25: &Bm25Index,
    query: &str,
    entity_type: &str,
    field: &str,
    limit: usize,
    alpha: f32,
    cache: Option<&mut SearchCache>,
    analytics: Option<&mut SearchAnalytics>,
) -> Result<Vec<HybridResult>, SearchError> {
    // Check cache first
    if let Some(ref mut c) = cache {
        if let Some(cached) = c.get_hybrid(query, entity_type, field, limit, alpha) {
            // Log cache hit in analytics
            if let Some(ref mut a) = analytics {
                a.log_search(query, "hybrid", cached.len(), 0, true);
            }
            return Ok(cached.to_vec());
        }
    }

    // Cache miss -- execute the actual search
    let start = Instant::now();
    let results = search_documents_hybrid(
        store, bm25, query, entity_type, field, limit, alpha
    )?;
    let latency = start.elapsed().as_millis() as u64;

    // Store in cache
    if let Some(ref mut c) = cache {
        c.insert_hybrid(query, entity_type, field, limit, alpha, results.clone());
    }

    // Log in analytics
    if let Some(ref mut a) = analytics {
        a.log_search(query, "hybrid", results.len(), latency, false);
    }

    Ok(results)
}

The wrapper functions are transparent to the caller. If caching is enabled (the default), the first search for a query executes the full search pipeline and stores the results. Subsequent searches for the same query return the cached results immediately. If caching is disabled (via configuration or by passing None), the search executes normally every time.

Performance Impact

We measured the performance impact of caching on FLIN's benchmark application with 10,000 indexed documents:

Scenario	Without Cache	With Cache (Hit)	Improvement
Basic search	8 ms	< 1 ms	8x+ faster
Multi-field search	12 ms	< 1 ms	12x+ faster
Hybrid search	25 ms	< 1 ms	25x+ faster
Search with snippets	15 ms	< 1 ms	15x+ faster
Deduplicated search	18 ms	< 1 ms	18x+ faster

Cache hits are sub-millisecond because they involve only a hash lookup and a clone of the cached results. The improvement is proportional to the search complexity -- hybrid search benefits more from caching because it has higher base latency.

The cache hit rate depends on the application's query distribution. For applications where users search for similar terms (product catalogs, documentation sites, FAQ systems), hit rates above 80% are common. For applications with highly unique queries (full-text search of legal documents, scientific databases), hit rates may be lower, but the cache still eliminates exact duplicate queries.

Cache Statistics

The cache tracks comprehensive statistics for monitoring and tuning:

rustpub struct SearchCacheStats {
    pub hits: u64,
    pub misses: u64,
    pub evictions: u64,
    pub invalidations: u64,
    pub entries_by_type: HashMap<String, usize>,
}

impl SearchCacheStats {
    pub fn hit_rate(&self) -> f64 {
        let total = self.hits + self.misses;
        if total == 0 { 0.0 } else { self.hits as f64 / total as f64 }
    }
}

These statistics are exposed through FLIN's admin console, allowing operators to monitor cache effectiveness and adjust configuration if needed. A consistently low hit rate suggests that the default TTL is too short or the max entries count is too low for the workload.

Integration with the Search Analytics System

The caching system integrates seamlessly with FLIN's search analytics (article 125). When analytics are enabled, cache hits are logged alongside cache misses. This provides accurate query frequency data regardless of caching -- the analytics system sees every search request, while the cache prevents redundant computation.

The analytics data also feeds back into cache tuning. Queries with high frequency and low result variability are ideal candidates for longer TTLs. Queries with high frequency but rapidly changing results need shorter TTLs. The analytics dashboard shows both metrics, enabling data-driven cache configuration.

Session 226 added 34 new tests covering cache key generation, TTL expiration, LRU eviction, entity invalidation, enable/disable toggling, and a full workflow integration test. Combined with the existing search and analytics tests, the search subsystem was among the most thoroughly tested in the FLIN codebase.

The total test count after Session 226 reached 3,225: 2,608 library tests and 617 integration tests. All passing.

This is Part 187 of the "How We Built FLIN" series, documenting how a CEO in Abidjan and an AI CTO designed and built a programming language from scratch.

Series Navigation: - [186] Error Resilience Patterns - [187] Search Result Caching (you are here) - [188] GC, CLI, and HTTP Integration Testing

#187 -- Search Result Caching

The Cache Architecture

Cache Keys

Cache Entries and TTL

LRU Eviction

Entity-Aware Invalidation

The Wrapper Functions

Performance Impact

Cache Statistics

Integration with the Search Analytics System

Responses

Related Articles

Step Zero Wasn’t Enough: How Validating A Constructor But Not The Runtime Took Down Every Déblo Voice Session The Hour We Shipped Real-Time Camera Streaming

The Em-Dash That Killed Production: How One Marketing Tagline In An HTTP Header Took Down Déblo’s Chat For 24 Hours

Six Hours From Empty Page to Apple Review — How We Submitted Déblo to the App Store, Live