#079 -- Validation and Sanitization Functions

Every web application has the same vulnerability: user input. A form field that expects an email address receives JavaScript injection. A search box that expects text receives an SQL query. A comment field that expects words receives a <script> tag. The only defense is validation (rejecting invalid input) and sanitization (neutralizing dangerous input).

In most languages, this defense requires installing packages. validator for JavaScript. django.core.validators for Python. govalidator for Go. Each package has its own API, its own edge cases, and its own vulnerabilities. In FLIN, validation and sanitization are built into the language. Sessions 131 and 190 through 192 added 67 functions that cover every validation and sanitization need a web application has.

The Two Lines of Defense

FLIN separates validation (checking whether input is valid) from sanitization (making input safe). Both are necessary. Validation tells you whether to accept input. Sanitization ensures that accepted input cannot cause harm.

flin// Validation: is this input acceptable?
email = "[email protected]"
email.is_email              // true -- accept it

// Sanitization: make this input safe for display
comment = "<script>alert('xss')</script>Hello!"
safe = sanitize_html(comment)
// "Hello!" -- the script tag is removed

Validation rejects bad input at the boundary. Sanitization neutralizes input that passes validation but might still be dangerous in a specific context (HTML display, SQL queries, URL construction).

Validation Functions

Text Format Validation

flintext.is_email              // RFC 5322 simplified email check
text.is_url                // Valid URL with protocol
text.is_ip                 // IPv4 or IPv6 address
text.is_ipv4               // IPv4 address only
text.is_ipv6               // IPv6 address only
text.is_uuid               // UUID v4 format
text.is_json               // Valid JSON string
text.is_hex                // Hexadecimal characters only
text.is_base64             // Valid base64 string

Each validation function returns a boolean. No exceptions. No error objects. Just true or false. If you need to know why validation failed (for user-facing error messages), the entity constraint system provides that:

flinentity User {
    email: text where is_email  // Constraint with automatic error message
    age: int where >= 13 and <= 120
    username: text where len >= 3 and len <= 30
}

// When validation fails, you get a specific error:
result = User.create(email: "not-an-email", age: 10)
// Error: "email: must be a valid email address"
// Error: "age: must be >= 13"

Numeric Validation

flintext.is_numeric            // All digits (0-9)
text.is_alpha              // All letters (a-z, A-Z)
text.is_alphanumeric       // Letters and digits only
text.is_integer            // Valid integer string ("-42", "0", "100")
text.is_float              // Valid float string ("3.14", "-0.5")

These functions validate string content without converting it. "42".is_numeric returns true without producing an integer value. This is the correct approach for form validation: first check that the input is valid, then convert it.

Pattern Validation

flintext.matches(pattern)       // Regex match
text.is_phone              // International phone number format
text.is_credit_card        // Credit card number (Luhn check)
text.is_hex_color          // "#ff6b35" or "#f63"
text.is_slug               // URL slug (lowercase, hyphens, no spaces)
text.is_semver             // Semantic version ("1.2.3")

is_phone validates international phone numbers with country codes. It does not validate that the number exists -- only that its format is plausible. This is sufficient for most web applications, where phone verification happens via SMS.

is_credit_card performs a Luhn checksum validation. It does not validate that the card is active or has funds -- that requires a payment processor API call. But the Luhn check catches typos and random numbers, preventing unnecessary API calls.

Length and Range Validation

flintext.len_between(3, 50)    // Length between 3 and 50 characters
text.len_min(3)            // At least 3 characters
text.len_max(50)           // At most 50 characters
n.between(1, 100)          // Number in range [1, 100]
n.positive                 // Greater than 0
n.negative                 // Less than 0

These are convenience wrappers around comparisons. text.len_between(3, 50) is equivalent to text.len >= 3 and text.len <= 50, but it reads better in validation chains and entity constraints.

Sanitization Functions

HTML Sanitization

HTML sanitization is the primary defense against Cross-Site Scripting (XSS) attacks. FLIN provides three levels:

flin// Level 1: Escape HTML entities (preserve all text, neutralize HTML)
html_escape("<script>alert('xss')</script>")
// "&lt;script&gt;alert('xss')&lt;/script&gt;"

// Level 2: Strip all HTML tags (keep text content only)
strip_tags("<b>Hello</b> <script>evil()</script> World")
// "Hello  World"

// Level 3: Sanitize HTML (keep safe tags, remove dangerous ones)
sanitize_html("<b>Bold</b> <script>evil()</script> <a href='url'>Link</a>")
// "<b>Bold</b>  <a href='url'>Link</a>"

html_escape converts every <, >, &, ", and ' to their HTML entity equivalents. This is the safest approach -- it preserves the input text but ensures nothing is interpreted as HTML.

strip_tags removes all HTML tags and returns only the text content. This is useful when you need plain text from user input that might contain HTML (pasted from a rich text editor, for example).

sanitize_html is the most sophisticated option. It maintains a whitelist of safe tags (<b>, <i>, <a>, <p>, <br>, <ul>, <li>, <h1> through <h6>, etc.) and removes everything else. Attributes are also filtered -- href is allowed on <a> tags but onclick is stripped. style attributes are removed entirely to prevent CSS injection.

flin// Sanitize user-generated content for display
entity Comment {
    author: text
    body: text
    sanitized_body: text
}

fn create_comment(author: text, raw_body: text) {
    Comment.create(
        author: author,
        body: raw_body,                    // Store original
        sanitized_body: sanitize_html(raw_body)  // Store sanitized
    )
}

SQL Injection Prevention

FLIN's entity system uses parameterized queries internally, which prevents SQL injection by design. But for developers who construct raw queries (escape hatch for advanced use cases), sanitization functions are available:

flin// Escape a string for safe SQL inclusion
safe = escape_sql("Robert'; DROP TABLE users; --")
// "Robert''; DROP TABLE users; --"

// Escape for LIKE patterns
safe_pattern = escape_sql_like("100%")
// "100\%"

escape_sql doubles single quotes, which prevents the classic SQL injection attack. escape_sql_like also escapes % and _ wildcards, preventing pattern injection in LIKE queries.

In practice, FLIN developers should almost never need these functions. The entity system's query API is parameterized:

flin// Safe by default -- parameterized query
user = User.find_by(email: user_input)

// Also safe -- where clause is parameterized
users = User.where(name: search_term)

URL Sanitization

flin// Encode for URL parameters
url_encode("hello world & more")
// "hello%20world%20%26%20more"

// Decode URL parameters
url_decode("hello%20world")
// "hello world"

// Build a safe URL
base = "https://api.example.com/search"
query = url_encode(user_input)
safe_url = "{base}?q={query}"

URL encoding ensures that special characters in user input do not break URL structure or get interpreted as URL components.

Path Traversal Prevention

flin// Sanitize a file path
safe_path = sanitize_path("../../etc/passwd")
// "etc/passwd" (relative traversal removed)

// Validate that a path is within a base directory
is_safe = path_within("/uploads", user_provided_path)

sanitize_path removes .. components and leading slashes, preventing directory traversal attacks. path_within checks that a resolved path stays within a specified base directory.

Entity-Level Validation

FLIN's entity system integrates validation at the schema level using where constraints:

flinentity Product {
    name: text where len >= 1 and len <= 200
    description: text where len <= 5000
    price: float where > 0
    sku: text where matches("^[A-Z]{2}-\\d{4}$")
    email: text where is_email
    url: text? where is_url
    category: text where in ["electronics", "clothing", "food", "other"]
}

These constraints are checked automatically on create and update operations. If any constraint fails, the operation returns an error with a descriptive message. The developer does not need to write validation logic separately -- it is declared once in the entity definition and enforced everywhere.

flin// This fails validation automatically
result = Product.create(
    name: "",                    // Too short
    price: -5.0,                 // Not positive
    sku: "invalid",              // Does not match pattern
    email: "not-an-email"        // Not a valid email
)
// Errors:
// - "name: length must be >= 1"
// - "price: must be > 0"
// - "sku: must match pattern ^[A-Z]{2}-\d{4}$"
// - "email: must be a valid email address"

Composing Validation and Sanitization

Real-world input handling combines validation and sanitization in sequence:

flinfn process_user_input(raw_name: text, raw_email: text, raw_bio: text) {
    // Step 1: Sanitize (make safe)
    name = raw_name.trim
    email = raw_email.trim.lower
    bio = sanitize_html(raw_bio)

    // Step 2: Validate (check acceptability)
    errors = []

    {if name.is_empty}
        errors.push("Name is required")
    {else if name.len > 100}
        errors.push("Name must be 100 characters or less")
    {/if}

    {if not email.is_email}
        errors.push("A valid email address is required")
    {/if}

    {if bio.len > 5000}
        errors.push("Bio must be 5000 characters or less")
    {/if}

    {if errors.len > 0}
        return { success: false, errors: errors }
    {/if}

    // Step 3: Persist (store validated, sanitized data)
    user = User.create(name: name, email: email, bio: bio)
    return { success: true, user: user }
}

The pattern is always the same: sanitize first (trim, normalize, remove dangerous content), validate second (check format and constraints), persist third (store the clean data). FLIN's built-in functions make each step a single line.

Implementation: Compiled Rust Validators

Each validation function is implemented as a Rust function that operates directly on the string's bytes. No regular expressions are used for standard validations (email, URL, IP address). Instead, each validator is a hand-written state machine that is both faster and more correct than a regex.

rustfn is_valid_email(s: &str) -> bool {
    // Simplified RFC 5322 validation
    let at_pos = match s.find('@') {
        Some(pos) if pos > 0 => pos,
        _ => return false,
    };

    let local = &s[..at_pos];
    let domain = &s[at_pos + 1..];

    // Local part validation
    if local.is_empty() || local.len() > 64 {
        return false;
    }

    // Domain validation
    if domain.is_empty() || domain.len() > 253 {
        return false;
    }

    // Domain must contain a dot
    if !domain.contains('.') {
        return false;
    }

    // TLD must be at least 2 characters
    let tld = domain.rsplit('.').next().unwrap();
    if tld.len() < 2 {
        return false;
    }

    true
}

This email validator is not a complete RFC 5322 implementation (which would allow quoted strings, comments, and other rarely-used features). It is a practical validator that accepts every email address a real user would have and rejects obvious non-emails. This pragmatic approach matches what every production email validator does -- including the ones in validator.js, django.core.validators, and govalidator.

The HTML sanitizer uses a streaming parser that processes the input character by character, maintaining a stack of open tags and a whitelist of allowed tags and attributes. It never builds a DOM tree in memory, so it handles arbitrarily large inputs without memory pressure.

What We Validate vs. What We Sanitize

The distinction between validation and sanitization is often confused. Here is FLIN's clear separation:

Validation functions answer the question "is this input acceptable?" They return booleans and do not modify the input:

is_email, is_url, is_ip, is_uuid
is_numeric, is_alpha, is_alphanumeric
is_phone, is_credit_card, is_hex_color
len_between, between, matches

Sanitization functions answer the question "how do I make this input safe?" They return modified strings:

html_escape, strip_tags, sanitize_html
escape_sql, escape_sql_like
url_encode, url_decode
sanitize_path
trim, lower, upper (normalization)

Using both together is the correct pattern: sanitize input before storage, validate input before processing, sanitize output before display. FLIN makes all three steps trivial.

Sixty-Seven Functions, Zero Vulnerabilities

The complete validation and sanitization API:

15 format validators (email, URL, IP, UUID, phone, credit card, etc.)
5 content validators (numeric, alpha, alphanumeric, empty, blank)
6 length/range validators (len_between, len_min, len_max, between, etc.)
1 pattern validator (matches)
3 HTML sanitizers (html_escape, strip_tags, sanitize_html)
1 HTML unescaper (html_unescape)
2 SQL sanitizers (escape_sql, escape_sql_like)
2 URL encoders (url_encode, url_decode)
2 path sanitizers (sanitize_path, path_within)
2 base64 coders (base64_encode, base64_decode)
28 entity constraint keywords (where, in, between, matches, etc.)

Sixty-seven functions that replace validator, sanitize-html, xss, dompurify, express-validator, and a dozen other security-related packages. All built-in. All handling edge cases correctly. All making the secure path the default path.

This is Part 79 of the "How We Built FLIN" series, documenting how a CEO in Abidjan and an AI CTO built input validation and sanitization directly into a programming language.

Series Navigation: - [78] Reduce, Map, Filter: Higher-Order Functions - [79] Validation and Sanitization Functions (you are here) - [80] Error Tracking and Performance Monitoring - [81] FlinUI: Zero-Import Component System