Back to flin
flin

#118 -- AI Gateway: 8 Providers, One API

How FLIN's AI Gateway provides a unified interface to OpenAI, Anthropic, DeepInfra, Google, Mistral, Cohere, Groq, and local models -- switch providers by changing one line of configuration.

Juste A. Gnimavo (Thales) & Claude | March 26, 2026 7 min flin
EN/ FR/ ES
flinaigatewayprovidersapi

The AI landscape in 2026 is fragmented. OpenAI has GPT-4o. Anthropic has Claude. Google has Gemini. Mistral has their open-weight models. Cohere specializes in embeddings. Groq offers inference at extraordinary speed. DeepInfra hosts dozens of open-source models. Each provider has its own API format, its own authentication scheme, its own pricing model, and its own SDK.

A FLIN application that uses AI should not be locked into a single provider. If OpenAI raises prices, you should be able to switch to DeepInfra. If Anthropic adds a feature you need, you should be able to try it without rewriting your code. If you want to run locally for privacy, you should be able to use a local model.

FLIN's AI Gateway provides a unified interface to eight providers. Your FLIN code calls ai_complete(), ai_embed(), and ai_chat(). The gateway routes the request to the configured provider, translates the API format, and returns a normalized response. Switching providers is one line in flin.config.

The Unified API

Three functions cover the most common AI operations:

flin// Text completion
response = ai_complete("Summarize this article: " + article.content, {
    max_tokens: 200,
    temperature: 0.3
})

// Chat completion
response = ai_chat([
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: user_message }
])

// Embeddings
vector = ai_embed("comfortable office chair for long work sessions")

These functions work regardless of which provider is configured. The API is the same whether you are using GPT-4o, Claude, Gemini, or a local Llama model.

Provider Configuration

The AI provider is configured in flin.config:

flin// flin.config
ai {
    provider = "openai"
    model = "gpt-4o-mini"
    embedding_model = "text-embedding-3-small"
    api_key = env("OPENAI_API_KEY")
}

Switching to another provider:

flin// Anthropic
ai {
    provider = "anthropic"
    model = "claude-3-haiku"
    api_key = env("ANTHROPIC_API_KEY")
}

// DeepInfra
ai {
    provider = "deepinfra"
    model = "meta-llama/Meta-Llama-3-8B-Instruct"
    api_key = env("DEEPINFRA_API_KEY")
}

// Groq (for speed)
ai {
    provider = "groq"
    model = "llama3-70b-8192"
    api_key = env("GROQ_API_KEY")
}

// Local (no API key needed)
ai {
    provider = "local"
    model = "llama3"
    endpoint = "http://localhost:11434"
}

The application code does not change. The same ai_complete() call works with every provider.

The Eight Supported Providers

ProviderModelsBest For
OpenAIGPT-4o, GPT-4o MiniGeneral-purpose, vision
AnthropicClaude 3 Opus, Sonnet, HaikuLong context, reasoning
GoogleGemini Pro, Gemini FlashMultimodal, speed
MistralMistral Large, Medium, SmallEuropean data residency
CohereCommand R+, Embed v3Embeddings, RAG
GroqLlama 3, MixtralUltra-low latency
DeepInfra50+ open modelsCost optimization
LocalOllama, llama.cppPrivacy, offline

Gateway Implementation

The gateway translates between FLIN's unified format and each provider's specific API:

rustpub struct AiGateway {
    provider: Box<dyn AiProvider>,
    config: AiConfig,
}

pub trait AiProvider: Send + Sync {
    async fn complete(&self, prompt: &str, opts: &CompletionOptions) -> Result<String, AiError>;
    async fn chat(&self, messages: &[Message], opts: &ChatOptions) -> Result<String, AiError>;
    async fn embed(&self, text: &str) -> Result<Vec<f32>, AiError>;
}

impl AiGateway {
    pub fn new(config: &AiConfig) -> Result<Self, AiError> {
        let provider: Box<dyn AiProvider> = match config.provider.as_str() {
            "openai" => Box::new(OpenAiProvider::new(&config)?),
            "anthropic" => Box::new(AnthropicProvider::new(&config)?),
            "google" => Box::new(GoogleProvider::new(&config)?),
            "mistral" => Box::new(MistralProvider::new(&config)?),
            "cohere" => Box::new(CohereProvider::new(&config)?),
            "groq" => Box::new(GroqProvider::new(&config)?),
            "deepinfra" => Box::new(DeepInfraProvider::new(&config)?),
            "local" => Box::new(LocalProvider::new(&config)?),
            other => return Err(AiError::UnknownProvider(other.into())),
        };

        Ok(Self { provider, config })
    }
}

Each provider implementation translates the unified request format to the provider's specific API:

rustpub struct OpenAiProvider {
    api_key: String,
    model: String,
    base_url: String,
}

impl AiProvider for OpenAiProvider {
    async fn chat(&self, messages: &[Message], opts: &ChatOptions) -> Result<String, AiError> {
        let body = json!({
            "model": self.model,
            "messages": messages.iter().map(|m| json!({
                "role": m.role,
                "content": m.content
            })).collect::<Vec<_>>(),
            "max_tokens": opts.max_tokens.unwrap_or(1024),
            "temperature": opts.temperature.unwrap_or(0.7),
        });

        let response = reqwest::Client::new()
            .post(format!("{}/chat/completions", self.base_url))
            .bearer_auth(&self.api_key)
            .json(&body)
            .send()
            .await?;

        let data: OpenAiResponse = response.json().await?;
        Ok(data.choices[0].message.content.clone())
    }
}

The Anthropic provider translates to Anthropic's format (which uses system as a separate parameter, not a message):

rustimpl AiProvider for AnthropicProvider {
    async fn chat(&self, messages: &[Message], opts: &ChatOptions) -> Result<String, AiError> {
        let system = messages.iter()
            .find(|m| m.role == "system")
            .map(|m| m.content.clone());

        let user_messages: Vec<_> = messages.iter()
            .filter(|m| m.role != "system")
            .map(|m| json!({ "role": m.role, "content": m.content }))
            .collect();

        let mut body = json!({
            "model": self.model,
            "messages": user_messages,
            "max_tokens": opts.max_tokens.unwrap_or(1024),
        });

        if let Some(sys) = system {
            body["system"] = json!(sys);
        }

        // ... send request to Anthropic API
    }
}

Fallback Chains

FLIN supports fallback configuration for high availability:

flinai {
    provider = "openai"
    model = "gpt-4o-mini"
    api_key = env("OPENAI_API_KEY")

    fallback {
        provider = "deepinfra"
        model = "meta-llama/Meta-Llama-3-8B-Instruct"
        api_key = env("DEEPINFRA_API_KEY")
    }
}

If the primary provider fails (rate limit, API error, timeout), the gateway automatically retries with the fallback provider. The application code is unaware of the failover.

Cost Optimization

Different providers have dramatically different pricing:

ProviderModelCost per 1M tokens
OpenAIGPT-4o Mini$0.15 input / $0.60 output
AnthropicClaude 3 Haiku$0.25 input / $1.25 output
DeepInfraLlama 3 8B$0.06 input / $0.06 output
GroqLlama 3 70B$0.59 input / $0.79 output

For the Intent Engine's query translation, where the task is relatively simple, a smaller model like Llama 3 8B on DeepInfra can be 10x cheaper than GPT-4o with comparable accuracy. FLIN's gateway makes this switch trivial.

Using AI in FLIN Applications

Beyond the Intent Engine and semantic search, FLIN developers can use AI directly in their applications:

flin// Summarize content
fn summarize(article) {
    ai_complete("Summarize this article in 2 sentences: " + article.content, {
        max_tokens: 100,
        temperature: 0.3
    })
}

// Classify support tickets
fn classify_ticket(ticket) {
    ai_chat([
        { role: "system", content: "Classify the ticket into: billing, technical, feature_request, other. Reply with just the category." },
        { role: "user", content: ticket.subject + "\n" + ticket.description }
    ])
}

// Generate product descriptions
fn generate_description(product) {
    ai_complete("Write a compelling product description for: " + product.name + ". Features: " + product.features, {
        max_tokens: 200,
        temperature: 0.7
    })
}

These are regular FLIN function calls. They work with any configured provider. They can be called from route handlers, scheduled tasks, or interactive views.

Rate Limiting and Caching

The gateway includes built-in rate limiting to respect provider limits:

rustpub struct ProviderRateLimiter {
    requests_per_minute: u32,
    tokens_per_minute: u32,
    current_requests: AtomicU32,
    current_tokens: AtomicU32,
    window_start: AtomicU64,
}

And response caching for repeated queries:

flin// First call: API request (200ms)
summary = ai_complete("Summarize: " + article.content)

// Same input later: cached response (< 1ms)
summary = ai_complete("Summarize: " + article.content)

The cache key is the hash of the full request (prompt, model, temperature). Responses are cached for a configurable duration (default: 1 hour).

Why a Gateway, Not a Library

The alternative to a gateway is provider-specific libraries: openai-sdk, anthropic-sdk, google-ai-sdk. Each with its own API, its own error handling, its own types. Switching providers means rewriting every AI call in your application.

FLIN's gateway makes provider selection a configuration decision, not a code decision. Your application logic expresses what it wants ("summarize this text," "classify this ticket," "embed this query"), and the gateway handles how to get it from whichever provider is configured.

This separation of concerns is especially important for the Intent Engine and semantic search, which are core language features. They should not stop working because you switched from OpenAI to Anthropic.

In the next article, we dive into FastEmbed integration -- how FLIN generates embeddings locally without any API call, enabling offline semantic search and privacy-first applications.


This is Part 118 of the "How We Built FLIN" series, documenting how a CEO in Abidjan and an AI CTO designed and built a programming language from scratch.

Series Navigation: - [117] Semantic Search and Vector Storage - [118] AI Gateway: 8 Providers, One API (you are here) - [119] FastEmbed Integration for Embeddings - [120] RAG: Retrieval, Reranking, and Source Attribution

Share this article:

Responses

Write a response
0/2000
Loading responses...

Related Articles

Thales & Claude deblo

Step Zero Wasn’t Enough: How Validating A Constructor But Not The Runtime Took Down Every Déblo Voice Session The Hour We Shipped Real-Time Camera Streaming

Phase 14 shipped Déblo Eyes — real-time camera streaming over LiveKit to Gemini Live native audio. The first deploy took down every voice session in production within ninety seconds because our Step 0 had validated the constructor without exercising the runtime path. The build log of how Déblo got eyes, what an incomplete pre-flight check cost us, and which polish items we shipped versus deferred.

30 min May 20, 2026
debloclaude-opus-4.7claude-codegemini-live +25
Thales & Claude deblo

The Em-Dash That Killed Production: How One Marketing Tagline In An HTTP Header Took Down Déblo’s Chat For 24 Hours

Two days before App Store submission, Déblo’s entire chat product silently broke. No spinner, no toast, no error in the UI — just dead air. The 24-hour outage came down to a single « é » in an HTTP header value raising UnicodeEncodeError before any request to OpenRouter ever left the backend. The post-mortem of a false hypothesis, a Sentry trace, and a 6-line fix that unblocked the launch.

27 min May 19, 2026
debloclaude-opus-4.7claude-codeincident +19
Thales & Claude deblo

Six Hours From Empty Page to Apple Review — How We Submitted Déblo to the App Store, Live

Live walkthrough of submitting Déblo to the iOS App Store in six hours: what Apple’s validators rejected (a Unicode superscript), what we corrected (a Promotional Text wasted on third-party brands), and the iOS ASO mechanics almost everyone gets wrong.

27 min May 13, 2026
debloclaude-opus-4.7claude-codeapp-store +16