Back to flin
flin

The Embedding Model Choice Crisis

When switching embedding models broke every existing vector in the database.

Thales & Claude | March 25, 2026 8 min flin
flinbugembeddingsmodel-selectionaivectors

When you build a programming language with AI capabilities baked in, you inherit the AI ecosystem's dependencies. FLIN's semantic search system -- the search, ask, and hybrid_search operations -- relies on embedding models to convert text into vector representations. The choice of embedding model affects search quality, memory usage, startup time, and multilingual support.

On January 17, 2026, we discovered that the embedding model we planned to use did not exist in the library we depended on. This was not a bug in the traditional sense -- no code was broken, no tests failed. It was a constraint crisis: the gap between what we wanted and what was available.

The Plan vs. Reality

FLIN uses the fastembed Rust crate for local embedding generation. "Local" is key -- FLIN's design philosophy requires that AI features work without external API calls. No OpenAI key needed. No internet connection required. The embedding model runs inside the FLIN process.

Our original plan was to offer two embedding models:

ModelDimensionsSizePurpose
multilingual-e5-small384~100 MBDefault, most users
bge-m31024~600 MBPower users, higher quality

BGE-M3 was the ideal choice for the high-quality option. It supports 100+ languages, produces 1024-dimensional vectors, and consistently ranks at the top of multilingual embedding benchmarks. It would give FLIN users access to state-of-the-art semantic search quality.

But BGE-M3 was not available in fastembed v4.9.1. There was an open GitHub issue (#538) requesting it, but no implementation. The model existed in the Python version of fastembed but had not been ported to the Rust crate.

This is a common frustration when building on the Rust ecosystem: Python libraries often have broader model support because the Python ML ecosystem is more mature. Rust offers better performance and memory safety, but at the cost of model availability.

The Alternative: multilingual-e5-large

With BGE-M3 unavailable, we evaluated the alternatives in fastembed v4.9.1:

rust// Available multilingual models in fastembed 4.9.1
EmbeddingModel::MultilingualE5Small   // 384-dim, ~100 MB
EmbeddingModel::MultilingualE5Large   // 1024-dim, ~600 MB
EmbeddingModel::AllMiniLmL6V2         // 384-dim, English only
EmbeddingModel::BGESmallEnV15         // 384-dim, English only

The English-only models were immediately disqualified. FLIN is designed for global use -- it powers applications from Abidjan to Zurich, in French, English, Arabic, and dozens of other languages. An English-only embedding model would make semantic search useless for the majority of FLIN's target users.

multilingual-e5-large was the clear choice: 1024 dimensions (matching our planned BGE-M3 dimensions), 100+ language support, and a well-validated architecture. It was not quite as capable as BGE-M3 on benchmarks, but the difference was marginal for most use cases.

The Architecture: Model Choice as Configuration

Rather than hardcoding the model choice, we built an architecture that allows users to select their embedding model through configuration:

rust#[derive(Debug, Clone, Copy, PartialEq, Eq, Default, Serialize, Deserialize)]
pub enum EmbeddingModelChoice {
    #[default]
    MultilingualE5Small,  // 384-dim, ~100 MB
    MultilingualE5Large,  // 1024-dim, ~600 MB
}

impl EmbeddingModelChoice {
    pub fn dimensions(&self) -> usize {
        match self {
            Self::MultilingualE5Small => 384,
            Self::MultilingualE5Large => 1024,
        }
    }

    pub fn max_context(&self) -> usize {
        512  // Same for both models
    }

    pub fn model_size_mb(&self) -> usize {
        match self {
            Self::MultilingualE5Small => 100,
            Self::MultilingualE5Large => 600,
        }
    }
}

The FLIN configuration syntax for model selection is clean:

flinai {
    embeddings: "local"
    model: "multilingual-e5-large"
}

Without the model option, the default (multilingual-e5-small) is used. This means existing applications work unchanged, and power users can opt into the larger model with a single configuration line.

Dual Singletons for Efficient Caching

Embedding models are expensive to initialize. Loading the model weights into memory takes several seconds and consumes significant RAM. We cannot load the model on every request.

The standard solution is a singleton -- load the model once and reuse it. But with two possible models, we needed two singletons:

ruststatic E5_SMALL: OnceCell<FastEmbedEngine> = OnceCell::new();
static E5_LARGE: OnceCell<FastEmbedEngine> = OnceCell::new();

impl FastEmbedEngine {
    pub fn global(choice: EmbeddingModelChoice) -> Result<&'static Self, EmbeddingError> {
        match choice {
            EmbeddingModelChoice::MultilingualE5Small => {
                E5_SMALL.get_or_try_init(|| Self::new(choice))
            }
            EmbeddingModelChoice::MultilingualE5Large => {
                E5_LARGE.get_or_try_init(|| Self::new(choice))
            }
        }
    }

    pub fn global_default() -> Result<&'static Self, EmbeddingError> {
        Self::global(EmbeddingModelChoice::default())
    }
}

Each model is loaded on first use and cached for the lifetime of the process. If an application only uses e5-small, the e5-large model is never loaded (and its 600 MB of memory is never allocated).

Config Parsing

The model choice needed to flow from the FLIN configuration file through to the embedding engine. We added parsing for the model option in the AI config block:

rust// src/database/config.rs
pub struct AiConfig {
    pub provider: EmbeddingProvider,
    pub local_model: EmbeddingModelChoice,  // NEW
    // ...
}

// Parse "model" option
"model" => {
    config.local_model = match value.as_str() {
        "multilingual-e5-large" | "e5-large" => EmbeddingModelChoice::MultilingualE5Large,
        "multilingual-e5-small" | "e5-small" => EmbeddingModelChoice::MultilingualE5Small,
        _ => EmbeddingModelChoice::default(),
    };
}

The parser accepts both full names (multilingual-e5-large) and short names (e5-large) for convenience.

Semantic Search Integration

The embedding provider enum was updated to carry the model choice:

rustpub enum EmbeddingProvider {
    FastEmbed { model: EmbeddingModelChoice },  // Model included
    OpenAI { api_key: String },
    Custom { endpoint: String },
}

impl EmbeddingConfig {
    pub fn fastembed() -> Self {
        Self { provider: EmbeddingProvider::FastEmbed {
            model: EmbeddingModelChoice::default()
        }}
    }

    pub fn fastembed_with_model(model: EmbeddingModelChoice) -> Self {
        Self { provider: EmbeddingProvider::FastEmbed { model }}
    }
}

When generating embeddings, the provider uses the model-specific engine:

rustpub fn generate_embeddings_batch(
    &self,
    texts: &[&str],
) -> Result<Vec<Vec<f32>>, EmbeddingError> {
    match &self.provider {
        EmbeddingProvider::FastEmbed { model } => {
            let engine = FastEmbedEngine::global(*model)?;
            engine.embed_batch(texts)
        }
        // ... other providers
    }
}

The Trade-off Matrix

Every model choice involves trade-offs. We documented them clearly so FLIN users can make informed decisions:

Factore5-smalle5-large
Dimensions3841024
Download size~100 MB~600 MB
Memory usage~200 MB~1.2 GB
Embedding speed~5 ms/text~15 ms/text
Search qualityGoodBetter
Multilingual100+ languages100+ languages
Best forMost applicationsRAG, precision search

For a todo app, e5-small is more than sufficient. For a document search engine or a RAG-powered knowledge base, e5-large provides meaningfully better results.

Why Not Both?

An application could theoretically use both models -- e5-small for quick searches and e5-large for deep semantic analysis. We decided against supporting this in v1.0 for simplicity:

  1. Vector dimensions must match. A search index built with 384-dimensional vectors cannot be queried with 1024-dimensional vectors. Mixing models would require separate indices.
  2. Configuration complexity. Users would need to specify which model to use for which entity, which field, which query.
  3. Memory cost. Loading both models consumes ~1.4 GB of RAM just for embeddings.

The single-model-per-application constraint keeps things simple and predictable.

The Waiting Game

The embedding model crisis was ultimately a timing issue. BGE-M3 will likely be added to fastembed in a future release. When it is, FLIN can add it as a third option with minimal code changes:

rustpub enum EmbeddingModelChoice {
    #[default]
    MultilingualE5Small,
    MultilingualE5Large,
    BgeM3,  // Future addition
}

The architecture we built -- the EmbeddingModelChoice enum, the dual singletons, the config parsing, the provider integration -- all generalize to any number of models. Adding a new model is a matter of extending the enum and adding a singleton, not redesigning the architecture.

This is a design principle we apply throughout FLIN: build the abstraction even when you only have two implementations. The abstraction costs almost nothing today and saves significant refactoring tomorrow.

The Multilingual Imperative

Both of our model choices are multilingual. This was non-negotiable. FLIN is built in Abidjan and used across francophone Africa, anglophone Africa, and beyond. A search system that only understands English would be useless for the majority of FLIN users.

The E5 model family excels at multilingual semantic similarity precisely because it was trained on parallel corpora across 100+ languages. A user searching in French finds results written in English, and vice versa. This cross-lingual capability is fundamental to FLIN's vision of a truly global programming language.

The embedding model crisis was not a bug we fixed but a constraint we navigated. The solution was not a code change but an architectural decision: choose the best available model, make the choice configurable, and design the system so better models can be swapped in without disruption.


This is Part 169 of the "How We Built FLIN" series, documenting how a CEO in Abidjan and an AI CTO designed and built a programming language from scratch.

Series Navigation: - [168] Entity Defaults and Toggle Fix - [169] The Embedding Model Choice Crisis (you are here) - [170] 15 Bugs That Shaped the FLIN Language

Share this article:

Responses

Write a response
0/2000
Loading responses...

Related Articles