#060 -- Aggregations and Analytics

A database that can only find records is half a database. The other half is answering questions about records: How many orders were placed this month? What is the average order value? Which category has the highest revenue? These are aggregation queries, and they are the foundation of every dashboard, report, and analytics feature in every application.

SQL handles aggregations with SUM(), AVG(), COUNT(), GROUP BY, and HAVING. These functions are powerful but require writing SQL strings -- which, in FLIN's world, is exactly the kind of accidental complexity we set out to eliminate.

Session 162 added complete aggregation support to FlinDB. Six feature sets. Twelve tests. Three hundred and fifty lines of Rust. In two hours, FlinDB went from a CRUD store to an analytics engine.

The Aggregation Functions

FlinDB supports four core aggregation functions, each operating on a named field across all entities (or a filtered subset).

flin// Sum: total of a numeric field
total_revenue = Order.sum("total")       // 975.0

// Average: mean of a numeric field
avg_rating = Review.avg("rating")        // 4.2

// Minimum: smallest value
cheapest = Product.min("price")          // 25

// Maximum: largest value
most_expensive = Product.max("price")    // 1500

Each function returns a single numeric value. They operate on the current (non-deleted) entities of a type, extracting the named field and computing the aggregate.

The Rust implementation follows a consistent pattern. Here is sum:

rustpub fn sum(&self, entity_type: &str, field: &str) -> DatabaseResult<f64> {
    let collection = self.data.get(entity_type)
        .ok_or(DatabaseError::EntityTypeNotFound)?;

    let mut total = 0.0;
    for versions in collection.values() {
        if let Some(entity) = versions.last() {
            if entity.deleted_at.is_some() { continue; }
            if let Some(value) = entity.fields.get(field) {
                match value {
                    Value::Int(n) => total += *n as f64,
                    Value::Number(n) => total += *n,
                    _ => {}  // Non-numeric fields are skipped
                }
            }
        }
    }
    Ok(total)
}

The edge case handling is important. Non-numeric values are silently skipped rather than causing errors. An entity with a missing field contributes zero to the sum. An empty collection returns 0.0 for sum and NaN for average. These semantics match SQL's behavior, which developers already expect.

We tested the empty-set edge case explicitly:

rust#[test]
fn test_aggregation_empty_set() {
    let db = ZeroCore::new();
    db.register_schema("Product", schema);
    // No entities saved
    assert_eq!(db.sum("Product", "price").unwrap(), 0.0);
    assert!(db.avg("Product", "price").unwrap().is_nan());
}

GROUP BY: Aggregates by Category

Raw aggregates answer simple questions. GROUP BY answers relational questions: "What is the total revenue per category?" or "What is the average salary per department?"

FlinDB implements GROUP BY through dedicated methods that combine grouping and aggregation:

flin// Revenue by category
revenue_by_category = Order.group_sum("category", "total")
// Returns: {"electronics": 15000, "books": 3000, "clothing": 8000}

// Average salary by department
avg_salary = Employee.group_avg("department", "salary")
// Returns: {"engineering": 95000, "design": 85000, "sales": 75000}

// Cheapest product per category
min_price = Product.group_min("category", "price")

// Most expensive product per category
max_price = Product.group_max("category", "price")

The group aggregation methods return a HashMap of group key to aggregate value. The Rust implementation collects values into buckets, then computes the aggregate per bucket:

rustpub fn group_sum(
    &self,
    entity_type: &str,
    group_field: &str,
    sum_field: &str,
) -> DatabaseResult<HashMap<String, f64>> {
    let collection = self.data.get(entity_type)
        .ok_or(DatabaseError::EntityTypeNotFound)?;

    let mut groups: HashMap<String, f64> = HashMap::new();

    for versions in collection.values() {
        if let Some(entity) = versions.last() {
            if entity.deleted_at.is_some() { continue; }

            let group_key = entity.fields.get(group_field)
                .map(|v| v.to_string())
                .unwrap_or_default();

            let value = match entity.fields.get(sum_field) {
                Some(Value::Int(n)) => *n as f64,
                Some(Value::Number(n)) => *n,
                _ => 0.0,
            };

            *groups.entry(group_key).or_insert(0.0) += value;
        }
    }

    Ok(groups)
}

The group key is converted to a string for the HashMap key. This works for text, integer, and boolean fields -- covering the vast majority of GROUP BY use cases. For complex grouping (multi-field groups or nested aggregations), the query builder provides more flexibility.

DISTINCT: Unique Values

The DISTINCT operation returns unique values of a field across all entities:

flincategories = Product.distinct("category")
// Returns: ["electronics", "books", "clothing"]

This is essential for building filter UIs ("show me all available categories"), generating reports ("list all departments"), and data validation ("what values exist for this field?").

The implementation collects unique values into a set:

rustpub fn distinct(
    &self,
    entity_type: &str,
    field: &str,
) -> DatabaseResult<Vec<Value>> {
    let collection = self.data.get(entity_type)
        .ok_or(DatabaseError::EntityTypeNotFound)?;

    let mut seen = HashSet::new();
    let mut result = Vec::new();

    for versions in collection.values() {
        if let Some(entity) = versions.last() {
            if entity.deleted_at.is_some() { continue; }
            if let Some(value) = entity.fields.get(field) {
                let key = value.to_string();
                if seen.insert(key) {
                    result.push(value.clone());
                }
            }
        }
    }

    Ok(result)
}

IN and NOT IN: Set Membership

SQL's IN and NOT IN operators filter rows based on set membership. FlinDB provides the same capability through query builder methods:

flin// IN: find orders with specific statuses
active = Order.where_in("status", ["pending", "processing"])

// NOT IN: exclude cancelled and refunded orders
valid = Order.where_not_in("status", ["cancelled", "refunded"])

These operators are implemented as query conditions that check whether a field's value is present (or absent) in a provided list:

rustpub fn where_in(
    mut self,
    field: &str,
    values: Vec<Value>,
) -> Self {
    self.conditions.push(QueryCondition::In {
        field: field.to_string(),
        values,
    });
    self
}

pub fn where_not_in(
    mut self,
    field: &str,
    values: Vec<Value>,
) -> Self {
    self.conditions.push(QueryCondition::NotIn {
        field: field.to_string(),
        values,
    });
    self
}

During query execution, In checks values.contains(&entity_value) and NotIn checks !values.contains(&entity_value). For small value lists (the common case), this is efficient. For large lists, the query optimizer could switch to a HashSet -- but in practice, IN clauses with more than a dozen values are rare in application code.

Multiple ORDER BY

Session 162 also added support for multiple ordering criteria -- a feature that is trivial in SQL (ORDER BY category ASC, price DESC) but requires careful implementation in a non-SQL engine:

flinsorted = Product
    .order_by_asc("category")
    .order_by_desc("price")

This sorts products alphabetically by category, then within each category by price descending. The implementation accumulates ordering fields and applies them as a multi-key sort:

rustpub fn order_by_asc(mut self, field: &str) -> Self {
    self.orderings.push((field.to_string(), SortDirection::Asc));
    self
}

pub fn order_by_desc(mut self, field: &str) -> Self {
    self.orderings.push((field.to_string(), SortDirection::Desc));
    self
}

During execution, the result set is sorted using Rust's sort_by with a comparator that evaluates ordering fields in sequence:

rustresults.sort_by(|a, b| {
    for (field, direction) in &self.orderings {
        let val_a = a.fields.get(field);
        let val_b = b.fields.get(field);
        let cmp = compare_values(val_a, val_b);
        match direction {
            SortDirection::Asc => {
                if cmp != Ordering::Equal { return cmp; }
            }
            SortDirection::Desc => {
                if cmp != Ordering::Equal { return cmp.reverse(); }
            }
        }
    }
    Ordering::Equal
});

Making this work required a bug fix: Value::Text comparison was not implemented. Text values could not be sorted alphabetically because the less_than() method on Value did not handle the Text variant. Session 162 added the missing comparison:

rust(Value::Text(a), Value::Text(b)) => Some(a < b)

Two lines of code. But without them, sorting by any text field would produce undefined results. This is the kind of subtle bug that only surfaces when you test real-world scenarios -- sorting products by category name rather than by price.

A Real-World Analytics Example

To demonstrate the complete analytics capability, we created an e-commerce analytics example that exercises every aggregation feature:

flinentity Product {
    name: text
    category: text
    price: number
    sales: int
}

// Aggregate queries
total_revenue = Product.sum("sales")
avg_price = Product.avg("price")
cheapest = Product.min("price")
most_expensive = Product.max("price")

// Group analytics
revenue_by_category = Product.group_sum("category", "sales")
avg_price_by_category = Product.group_avg("category", "price")

// Filtered aggregates
electronics = Product.where(category == "electronics")
electronics_revenue = electronics.sum("sales")

// Distinct values
categories = Product.distinct("category")

// Complex query
top_products = Product
    .where_not_in("category", ["discontinued"])
    .order_by_desc("sales")
    .limit(10)

This example reads like a business analyst's requirements document. No SQL. No JOIN. No GROUP BY clause with column aliasing. Just methods that say what they do.

The Twelve Tests

Session 162 added twelve tests covering every aggregation feature:

test_aggregation_sum -- sum of numeric field
test_aggregation_avg -- average of numeric field
test_aggregation_min_max -- minimum and maximum
test_aggregation_empty_set -- edge case with no entities
test_group_sum -- GROUP BY with SUM
test_group_avg -- GROUP BY with AVG
test_group_min_max -- GROUP BY with MIN/MAX
test_distinct -- unique value extraction
test_where_in -- IN operator
test_where_not_in -- NOT IN operator
test_multiple_order_by -- multi-field sorting
test_clear_order -- order clearing functionality

The total test count after Session 162: 2,111 (1,505 library + 606 integration). Each test validated not just the happy path but edge cases: empty collections, non-numeric fields, null values, and sorting stability.

What Came Next

Session 162 brought FlinDB's query engine to feature parity with basic SQL analytics. But two critical gaps remained:

Indexes were not used. The @index annotation existed in the schema definition, but all queries still performed full table scans. For a database with a few thousand entities, this was fast enough. For production workloads with tens of thousands of entities, O(n) queries would become a bottleneck. Session 163 would fix this by implementing actual index data structures and integrating them into the query execution path.

Relationships were not queryable. resolve_reference() existed but was not integrated with the query builder. You could not write Post.where(author == user) and have it resolve the relationship. Session 164 would add eager loading, reference queries, and inverse queries.

The aggregation engine was the last piece of the query foundation. With CRUD, constraints, and analytics in place, FlinDB had the expressiveness of a production database. The next challenge was making it fast.

This is Part 5 of the "How We Built FlinDB" series, documenting how we built a complete embedded database engine for the FLIN programming language.

Series Navigation: - [056] FlinDB: Zero-Configuration Embedded Database - [057] Entities, Not Tables: How FlinDB Thinks About Data - [058] CRUD Without SQL - [059] Constraints and Validation in FlinDB - [060] Aggregations and Analytics (you are here)

#060 -- Aggregations and Analytics

The Aggregation Functions

GROUP BY: Aggregates by Category

DISTINCT: Unique Values

IN and NOT IN: Set Membership

Multiple ORDER BY

A Real-World Analytics Example

The Twelve Tests

What Came Next

Responses

Related Articles

Step Zero Wasn’t Enough: How Validating A Constructor But Not The Runtime Took Down Every Déblo Voice Session The Hour We Shipped Real-Time Camera Streaming

The Em-Dash That Killed Production: How One Marketing Tagline In An HTTP Header Took Down Déblo’s Chat For 24 Hours

Six Hours From Empty Page to Apple Review — How We Submitted Déblo to the App Store, Live