#118 -- AI Gateway: 8 Providers, One API

The AI landscape in 2026 is fragmented. OpenAI has GPT-4o. Anthropic has Claude. Google has Gemini. Mistral has their open-weight models. Cohere specializes in embeddings. Groq offers inference at extraordinary speed. DeepInfra hosts dozens of open-source models. Each provider has its own API format, its own authentication scheme, its own pricing model, and its own SDK.

A FLIN application that uses AI should not be locked into a single provider. If OpenAI raises prices, you should be able to switch to DeepInfra. If Anthropic adds a feature you need, you should be able to try it without rewriting your code. If you want to run locally for privacy, you should be able to use a local model.

FLIN's AI Gateway provides a unified interface to eight providers. Your FLIN code calls ai_complete(), ai_embed(), and ai_chat(). The gateway routes the request to the configured provider, translates the API format, and returns a normalized response. Switching providers is one line in flin.config.

The Unified API

Three functions cover the most common AI operations:

flin// Text completion
response = ai_complete("Summarize this article: " + article.content, {
    max_tokens: 200,
    temperature: 0.3
})

// Chat completion
response = ai_chat([
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: user_message }
])

// Embeddings
vector = ai_embed("comfortable office chair for long work sessions")

These functions work regardless of which provider is configured. The API is the same whether you are using GPT-4o, Claude, Gemini, or a local Llama model.

Provider Configuration

The AI provider is configured in flin.config:

flin// flin.config
ai {
    provider = "openai"
    model = "gpt-4o-mini"
    embedding_model = "text-embedding-3-small"
    api_key = env("OPENAI_API_KEY")
}

Switching to another provider:

flin// Anthropic
ai {
    provider = "anthropic"
    model = "claude-3-haiku"
    api_key = env("ANTHROPIC_API_KEY")
}

// DeepInfra
ai {
    provider = "deepinfra"
    model = "meta-llama/Meta-Llama-3-8B-Instruct"
    api_key = env("DEEPINFRA_API_KEY")
}

// Groq (for speed)
ai {
    provider = "groq"
    model = "llama3-70b-8192"
    api_key = env("GROQ_API_KEY")
}

// Local (no API key needed)
ai {
    provider = "local"
    model = "llama3"
    endpoint = "http://localhost:11434"
}

The application code does not change. The same ai_complete() call works with every provider.

The Eight Supported Providers

Provider	Models	Best For
OpenAI	GPT-4o, GPT-4o Mini	General-purpose, vision
Anthropic	Claude 3 Opus, Sonnet, Haiku	Long context, reasoning
Google	Gemini Pro, Gemini Flash	Multimodal, speed
Mistral	Mistral Large, Medium, Small	European data residency
Cohere	Command R+, Embed v3	Embeddings, RAG
Groq	Llama 3, Mixtral	Ultra-low latency
DeepInfra	50+ open models	Cost optimization
Local	Ollama, llama.cpp	Privacy, offline

Gateway Implementation

The gateway translates between FLIN's unified format and each provider's specific API:

rustpub struct AiGateway {
    provider: Box<dyn AiProvider>,
    config: AiConfig,
}

pub trait AiProvider: Send + Sync {
    async fn complete(&self, prompt: &str, opts: &CompletionOptions) -> Result<String, AiError>;
    async fn chat(&self, messages: &[Message], opts: &ChatOptions) -> Result<String, AiError>;
    async fn embed(&self, text: &str) -> Result<Vec<f32>, AiError>;
}

impl AiGateway {
    pub fn new(config: &AiConfig) -> Result<Self, AiError> {
        let provider: Box<dyn AiProvider> = match config.provider.as_str() {
            "openai" => Box::new(OpenAiProvider::new(&config)?),
            "anthropic" => Box::new(AnthropicProvider::new(&config)?),
            "google" => Box::new(GoogleProvider::new(&config)?),
            "mistral" => Box::new(MistralProvider::new(&config)?),
            "cohere" => Box::new(CohereProvider::new(&config)?),
            "groq" => Box::new(GroqProvider::new(&config)?),
            "deepinfra" => Box::new(DeepInfraProvider::new(&config)?),
            "local" => Box::new(LocalProvider::new(&config)?),
            other => return Err(AiError::UnknownProvider(other.into())),
        };

        Ok(Self { provider, config })
    }
}

Each provider implementation translates the unified request format to the provider's specific API:

rustpub struct OpenAiProvider {
    api_key: String,
    model: String,
    base_url: String,
}

impl AiProvider for OpenAiProvider {
    async fn chat(&self, messages: &[Message], opts: &ChatOptions) -> Result<String, AiError> {
        let body = json!({
            "model": self.model,
            "messages": messages.iter().map(|m| json!({
                "role": m.role,
                "content": m.content
            })).collect::<Vec<_>>(),
            "max_tokens": opts.max_tokens.unwrap_or(1024),
            "temperature": opts.temperature.unwrap_or(0.7),
        });

        let response = reqwest::Client::new()
            .post(format!("{}/chat/completions", self.base_url))
            .bearer_auth(&self.api_key)
            .json(&body)
            .send()
            .await?;

        let data: OpenAiResponse = response.json().await?;
        Ok(data.choices[0].message.content.clone())
    }
}

The Anthropic provider translates to Anthropic's format (which uses system as a separate parameter, not a message):

rustimpl AiProvider for AnthropicProvider {
    async fn chat(&self, messages: &[Message], opts: &ChatOptions) -> Result<String, AiError> {
        let system = messages.iter()
            .find(|m| m.role == "system")
            .map(|m| m.content.clone());

        let user_messages: Vec<_> = messages.iter()
            .filter(|m| m.role != "system")
            .map(|m| json!({ "role": m.role, "content": m.content }))
            .collect();

        let mut body = json!({
            "model": self.model,
            "messages": user_messages,
            "max_tokens": opts.max_tokens.unwrap_or(1024),
        });

        if let Some(sys) = system {
            body["system"] = json!(sys);
        }

        // ... send request to Anthropic API
    }
}

Fallback Chains

FLIN supports fallback configuration for high availability:

flinai {
    provider = "openai"
    model = "gpt-4o-mini"
    api_key = env("OPENAI_API_KEY")

    fallback {
        provider = "deepinfra"
        model = "meta-llama/Meta-Llama-3-8B-Instruct"
        api_key = env("DEEPINFRA_API_KEY")
    }
}

If the primary provider fails (rate limit, API error, timeout), the gateway automatically retries with the fallback provider. The application code is unaware of the failover.

Cost Optimization

Different providers have dramatically different pricing:

Provider	Model	Cost per 1M tokens
OpenAI	GPT-4o Mini	$0.15 input / $0.60 output
Anthropic	Claude 3 Haiku	$0.25 input / $1.25 output
DeepInfra	Llama 3 8B	$0.06 input / $0.06 output
Groq	Llama 3 70B	$0.59 input / $0.79 output

For the Intent Engine's query translation, where the task is relatively simple, a smaller model like Llama 3 8B on DeepInfra can be 10x cheaper than GPT-4o with comparable accuracy. FLIN's gateway makes this switch trivial.

Using AI in FLIN Applications

Beyond the Intent Engine and semantic search, FLIN developers can use AI directly in their applications:

flin// Summarize content
fn summarize(article) {
    ai_complete("Summarize this article in 2 sentences: " + article.content, {
        max_tokens: 100,
        temperature: 0.3
    })
}

// Classify support tickets
fn classify_ticket(ticket) {
    ai_chat([
        { role: "system", content: "Classify the ticket into: billing, technical, feature_request, other. Reply with just the category." },
        { role: "user", content: ticket.subject + "\n" + ticket.description }
    ])
}

// Generate product descriptions
fn generate_description(product) {
    ai_complete("Write a compelling product description for: " + product.name + ". Features: " + product.features, {
        max_tokens: 200,
        temperature: 0.7
    })
}

These are regular FLIN function calls. They work with any configured provider. They can be called from route handlers, scheduled tasks, or interactive views.

Rate Limiting and Caching

The gateway includes built-in rate limiting to respect provider limits:

rustpub struct ProviderRateLimiter {
    requests_per_minute: u32,
    tokens_per_minute: u32,
    current_requests: AtomicU32,
    current_tokens: AtomicU32,
    window_start: AtomicU64,
}

And response caching for repeated queries:

flin// First call: API request (200ms)
summary = ai_complete("Summarize: " + article.content)

// Same input later: cached response (< 1ms)
summary = ai_complete("Summarize: " + article.content)

The cache key is the hash of the full request (prompt, model, temperature). Responses are cached for a configurable duration (default: 1 hour).

Why a Gateway, Not a Library

The alternative to a gateway is provider-specific libraries: openai-sdk, anthropic-sdk, google-ai-sdk. Each with its own API, its own error handling, its own types. Switching providers means rewriting every AI call in your application.

FLIN's gateway makes provider selection a configuration decision, not a code decision. Your application logic expresses what it wants ("summarize this text," "classify this ticket," "embed this query"), and the gateway handles how to get it from whichever provider is configured.

This separation of concerns is especially important for the Intent Engine and semantic search, which are core language features. They should not stop working because you switched from OpenAI to Anthropic.

In the next article, we dive into FastEmbed integration -- how FLIN generates embeddings locally without any API call, enabling offline semantic search and privacy-first applications.

This is Part 118 of the "How We Built FLIN" series, documenting how a CEO in Abidjan and an AI CTO designed and built a programming language from scratch.

Series Navigation: - [117] Semantic Search and Vector Storage - [118] AI Gateway: 8 Providers, One API (you are here) - [119] FastEmbed Integration for Embeddings - [120] RAG: Retrieval, Reranking, and Source Attribution

#118 -- AI Gateway: 8 Providers, One API

The Unified API

Provider Configuration

The Eight Supported Providers

Gateway Implementation

Fallback Chains

Cost Optimization

Using AI in FLIN Applications

Rate Limiting and Caching

Why a Gateway, Not a Library

Responses

Related Articles

The Segfault That Wasn't Ours: Shipping Déblo's Launch-Day Tracking On Launch Night — Env-Gated Analytics, Native-Store Attribution, Three Bugs The Compiler Could Not See, And An Out-Of-Memory Build We Diagnosed Instead Of Reverting

Thirteen Agents, Forty-Three Minutes: The First Claude Fable 5 Workflow Session, And What A Deterministic Orchestration Script Changes About Multi-Agent Builds

The gate caught its own drift: one day inside CASP with Claude Fable 5