#114 -- 75 Security Tests: How We Verified Everything

Building security features is half the battle. Proving they work is the other half. A hash function that silently returns an empty string. A rate limiter that resets on every request. A guard that passes when it should fail. A JWT verifier that accepts expired tokens. Each of these bugs is invisible in normal operation and catastrophic in an attack scenario.

We wrote 75 security-focused tests for the FLIN runtime, organized into ten categories. Every test is a Rust #[test] function that exercises a specific security behavior and asserts that it works correctly. No mocking. No stubs. Real cryptographic operations, real token generation, real rate limit counters.

This article documents the test categories, shows representative tests from each, and explains the testing philosophy that guided us.

Test Organization

The 75 tests are organized into ten categories:

tests/security/
    password_hashing.rs      -- 12 tests
    jwt_tokens.rs            -- 10 tests
    rate_limiting.rs         -- 8 tests
    guards.rs                -- 10 tests
    csrf.rs                  -- 5 tests
    input_validation.rs      -- 12 tests
    session_security.rs      -- 6 tests
    security_headers.rs      -- 4 tests
    file_upload_security.rs  -- 4 tests
    cryptographic_safety.rs  -- 4 tests

Each test file focuses on one security domain and tests both the happy path (feature works correctly) and the attack path (feature resists misuse).

Category 1: Password Hashing (12 Tests)

The password hashing tests verify Argon2id behavior end to end:

rust#[test]
fn test_hash_produces_argon2id_format() {
    let hash = hash_password("test-password").unwrap();
    assert!(hash.starts_with("$argon2id$"));
}

#[test]
fn test_correct_password_verifies() {
    let hash = hash_password("correct-horse-battery-staple").unwrap();
    assert!(verify_password("correct-horse-battery-staple", &hash).unwrap());
}

#[test]
fn test_wrong_password_fails() {
    let hash = hash_password("correct-password").unwrap();
    assert!(!verify_password("wrong-password", &hash).unwrap());
}

#[test]
fn test_same_password_different_hashes() {
    let hash1 = hash_password("same-password").unwrap();
    let hash2 = hash_password("same-password").unwrap();
    assert_ne!(hash1, hash2); // Different salts -> different hashes
}

#[test]
fn test_empty_password_hashes() {
    // Empty passwords should still hash (some apps allow them for OAuth users)
    let hash = hash_password("").unwrap();
    assert!(verify_password("", &hash).unwrap());
    assert!(!verify_password("not-empty", &hash).unwrap());
}

#[test]
fn test_unicode_password() {
    let hash = hash_password("mot de passe avec des caracteres speciaux eeeacu").unwrap();
    assert!(verify_password("mot de passe avec des caracteres speciaux eeeacu", &hash).unwrap());
}

The test for different hashes from the same password is critical -- it verifies that salting works. If two hashes of "password" are identical, the salt generation is broken and rainbow table attacks are possible.

Category 2: JWT Tokens (10 Tests)

JWT tests verify creation, verification, expiration, and tampering detection:

rust#[test]
fn test_create_and_verify_token() {
    let user = mock_user(42, "admin");
    let token = create_token(&user, &TokenOptions {
        expires: Duration::hours(1),
        claims: HashMap::new(),
    }).unwrap();

    let claims = verify_token(&token, &TEST_SECRET).unwrap();
    assert_eq!(claims.sub, "42");
}

#[test]
fn test_expired_token_rejected() {
    let user = mock_user(42, "admin");
    let token = create_token(&user, &TokenOptions {
        expires: Duration::seconds(-1), // Already expired
        claims: HashMap::new(),
    }).unwrap();

    assert!(verify_token(&token, &TEST_SECRET).is_none());
}

#[test]
fn test_tampered_token_rejected() {
    let user = mock_user(42, "admin");
    let token = create_token(&user, &TokenOptions::default()).unwrap();

    // Tamper with the payload
    let parts: Vec<&str> = token.splitn(3, '.').collect();
    let tampered = format!("{}.{}{}.{}", parts[0], parts[1], "tampered", parts[2]);

    assert!(verify_token(&tampered, &TEST_SECRET).is_none());
}

#[test]
fn test_wrong_secret_rejected() {
    let user = mock_user(42, "admin");
    let token = create_token(&user, &TokenOptions::default()).unwrap();

    assert!(verify_token(&token, b"wrong-secret").is_none());
}

#[test]
fn test_custom_claims_preserved() {
    let user = mock_user(42, "admin");
    let mut custom = HashMap::new();
    custom.insert("role".into(), Value::String("admin".into()));
    custom.insert("org_id".into(), Value::Int(7));

    let token = create_token(&user, &TokenOptions {
        expires: Duration::hours(1),
        claims: custom,
    }).unwrap();

    let claims = verify_token(&token, &TEST_SECRET).unwrap();
    assert_eq!(claims.get("role"), Some(&Value::String("admin".into())));
    assert_eq!(claims.get("org_id"), Some(&Value::Int(7)));
}

The tampered token test is especially important. It verifies that changing even a single character in the payload invalidates the signature.

Category 3: Rate Limiting (8 Tests)

Rate limiting tests verify counting, window behavior, and reset logic:

rust#[test]
fn test_allows_within_limit() {
    let mut limiter = RateLimiter::new();
    for _ in 0..5 {
        assert!(limiter.check("test-ip", 5, 60).is_allowed());
    }
}

#[test]
fn test_blocks_over_limit() {
    let mut limiter = RateLimiter::new();
    for _ in 0..5 {
        limiter.check("test-ip", 5, 60);
    }
    let result = limiter.check("test-ip", 5, 60);
    assert!(!result.is_allowed());
    assert!(result.retry_after() > 0);
}

#[test]
fn test_different_keys_independent() {
    let mut limiter = RateLimiter::new();
    for _ in 0..5 {
        limiter.check("ip-a", 5, 60);
    }
    // ip-b should still be allowed
    assert!(limiter.check("ip-b", 5, 60).is_allowed());
}

#[test]
fn test_remaining_count_decreases() {
    let mut limiter = RateLimiter::new();
    let r1 = limiter.check("ip", 10, 60);
    assert_eq!(r1.remaining(), 9);

    let r2 = limiter.check("ip", 10, 60);
    assert_eq!(r2.remaining(), 8);
}

Category 4: Guards (10 Tests)

Guard tests verify that each guard type correctly allows or rejects requests:

rust#[test]
fn test_auth_guard_rejects_unauthenticated() {
    let ctx = RequestContext::anonymous();
    let result = guard_auth(&ctx, &[]);
    assert!(matches!(result, GuardResult::Fail(_)));
}

#[test]
fn test_auth_guard_accepts_session() {
    let mut ctx = RequestContext::anonymous();
    ctx.session.set("user", "[email protected]");
    let result = guard_auth(&ctx, &[]);
    assert!(matches!(result, GuardResult::Pass));
}

#[test]
fn test_role_guard_rejects_wrong_role() {
    let mut ctx = RequestContext::authenticated("user");
    ctx.set_role("viewer");
    let params = vec![Value::String("admin".into())];
    let result = guard_role(&ctx, &params);
    assert!(matches!(result, GuardResult::Fail(_)));
}

#[test]
fn test_role_guard_accepts_matching_role() {
    let mut ctx = RequestContext::authenticated("admin");
    ctx.set_role("admin");
    let params = vec![Value::String("admin".into()), Value::String("superadmin".into())];
    let result = guard_role(&ctx, &params);
    assert!(matches!(result, GuardResult::Pass));
}

Category 5: CSRF Protection (5 Tests)

rust#[test]
fn test_csrf_token_unique_per_session() {
    let session1 = Session::new();
    let session2 = Session::new();
    let token1 = generate_csrf_token(&session1);
    let token2 = generate_csrf_token(&session2);
    assert_ne!(token1, token2);
}

#[test]
fn test_csrf_validation_rejects_wrong_token() {
    let session = Session::new();
    let _token = generate_csrf_token(&session);
    assert!(!validate_csrf(&session, "wrong-token"));
}

Category 6: Input Validation (12 Tests)

rust#[test]
fn test_required_field_rejects_empty() {
    let fields = vec![ValidateField::new("name", FieldType::Text).required()];
    let body = json!({});
    let result = validate_body(&body, &fields);
    assert!(result.is_err());
    assert!(result.unwrap_err().fields.contains_key("name"));
}

#[test]
fn test_email_validator_rejects_invalid() {
    let fields = vec![ValidateField::new("email", FieldType::Text).email()];
    let body = json!({"email": "not-an-email"});
    let result = validate_body(&body, &fields);
    assert!(result.is_err());
}

#[test]
fn test_email_validator_accepts_valid() {
    let fields = vec![ValidateField::new("email", FieldType::Text).email()];
    let body = json!({"email": "[email protected]"});
    let result = validate_body(&body, &fields);
    assert!(result.is_ok());
}

#[test]
fn test_min_length_validator() {
    let fields = vec![ValidateField::new("name", FieldType::Text).min_length(3)];
    let body = json!({"name": "AB"});
    let result = validate_body(&body, &fields);
    assert!(result.is_err());

    let body = json!({"name": "ABC"});
    let result = validate_body(&body, &fields);
    assert!(result.is_ok());
}

#[test]
fn test_type_coercion_int_from_string() {
    let fields = vec![ValidateField::new("age", FieldType::Int).min(0.0).max(150.0)];
    let body = json!({"age": "25"});
    let result = validate_body(&body, &fields).unwrap();
    assert_eq!(result.get("age"), Some(&Value::Int(25)));
}

Category 7-10: Sessions, Headers, File Uploads, Crypto

The remaining categories cover session encryption, security header presence, directory traversal prevention, and constant-time comparison:

rust#[test]
fn test_session_cookie_is_encrypted() {
    let session = Session::new();
    session.set("secret", "sensitive-data");
    let cookie = session.to_cookie(&ENCRYPTION_KEY);
    // Cookie should not contain plaintext
    assert!(!cookie.contains("sensitive-data"));
}

#[test]
fn test_security_headers_present_in_production() {
    let response = make_request_in_production_mode("/api/test");
    assert_eq!(response.header("X-Frame-Options"), Some("DENY"));
    assert_eq!(response.header("X-Content-Type-Options"), Some("nosniff"));
    assert!(response.header("Content-Security-Policy").is_some());
}

#[test]
fn test_directory_traversal_blocked() {
    let result = serve_static("../../etc/passwd", Path::new("/app/public"));
    assert!(result.is_none()); // Path traversal blocked
}

#[test]
fn test_constant_time_eq_same_length() {
    // This test verifies the function exists and works
    // It cannot test timing properties in a unit test
    assert!(constant_time_eq(b"hello", b"hello"));
    assert!(!constant_time_eq(b"hello", b"world"));
}

The Testing Philosophy

Three principles guided our security testing:

Test the failure path. Most tests verify that invalid input is rejected. This is counterintuitive -- developers usually test that valid input works. But security is about what happens when things go wrong. An expired token must be rejected. A tampered payload must be detected. A rate-limited client must be blocked.

No mocking for cryptographic operations. The password hashing tests use real Argon2id. The JWT tests use real HMAC-SHA256. Mocking a cryptographic function defeats the purpose of testing it. If the real implementation has a bug, the mock will not catch it.

Test independence. Each test creates its own state, runs its assertion, and cleans up. Tests can run in any order, in parallel, without affecting each other. A flaky security test is worse than no test at all.

Running the Tests

bashcargo test --test security -- --test-threads=4

All 75 tests complete in under 10 seconds. The password hashing tests are the slowest (Argon2id with 64 MB memory takes approximately 200 ms per hash), but even these complete quickly enough for the test suite to run on every commit.

The security tests are not optional. They run in CI alongside the unit tests and integration tests. A security regression breaks the build.

In the next article, we cover custom guards and security middleware -- how developers extend FLIN's built-in security with application-specific access control logic.

This is Part 114 of the "How We Built FLIN" series, documenting how a CEO in Abidjan and an AI CTO designed and built a programming language from scratch.

Series Navigation: - [113] Request Body Validators - [114] 75 Security Tests: How We Verified Everything (you are here) - [115] Custom Guards and Security Middleware - [116] The Intent Engine: Natural Language Database Queries

#114 -- 75 Security Tests: How We Verified Everything

Test Organization

Category 1: Password Hashing (12 Tests)

Category 2: JWT Tokens (10 Tests)

Category 3: Rate Limiting (8 Tests)

Category 4: Guards (10 Tests)

Category 5: CSRF Protection (5 Tests)

Category 6: Input Validation (12 Tests)

Category 7-10: Sessions, Headers, File Uploads, Crypto

The Testing Philosophy

Running the Tests

Responses

Related Articles

Step Zero Wasn’t Enough: How Validating A Constructor But Not The Runtime Took Down Every Déblo Voice Session The Hour We Shipped Real-Time Camera Streaming

The Em-Dash That Killed Production: How One Marketing Tagline In An HTTP Header Took Down Déblo’s Chat For 24 Hours

Six Hours From Empty Page to Apple Review — How We Submitted Déblo to the App Store, Live