An LLM is not a knowledge base

On May 27, 2025, a short video published by the French Government Information Service on Instagram and TikTok intended to commemorate the Resistance. A few hours later, it was deleted after historians and journalists pointed out glaring errors, notably the presence of a German soldier in the middle of Liberation scenes, as well as a Japanese flag visible in the background. The facts are public, and the deletion was confirmed on May 28. This wasn’t merely a communication mishap, it’s the illustration of a persistent confusion: mistaking a generative system for a knowledge base.

This misunderstanding isn’t limited to images. It also shapes how we use text. When a large language model responds with ease, we believe we’re consulting a reliable library. In reality, we’re interrogating a language engine that calculates plausible continuations, without native exposure of sources or dates. This is precisely what this article will address, not the rhetoric of plausibility, already covered in my previous article: For Al, truth does not exist (link here), but knowledge engineering: provenance, updates, consistency, and interrogability.

My objective is simple: to clearly distinguish what an LLM can do from what a knowledge base does. Then to show how they’re properly combined in contemporary practice, drawing on two works that have become references: Petroni et al. (2019) on the implicit factual memory of models, and Lewis et al. (2020) on retrieval-augmented generation (the famous RAG that many discuss and present as a miracle solution), that is, the addition of a consultable explicit memory. I’ll return to this later.

Parametric Memory, Explicit Memory

Before going further, we need to clarify what we’re talking about. Behind the word “memory,” we often confuse two very different realities. It’s somewhat like putting an individual’s memories and a library’s archives in the same category: both preserve information, but not in the same way, nor for the same purposes.

An LLM is fundamentally a gigantic parametric mechanism. Its training adjusts billions of coefficients so that, faced with a sequence of words, it predicts the most probable continuation. This “memory” is internal, frozen in parameters, and sometimes gives the illusion of encyclopedic knowledge.

But a knowledge base doesn’t function this way. It relies on verifiable elements: who says what, on what date, in what context. It can be updated, browsed, and its precise source extracted. It’s alive, revisable, transparent. Exactly the opposite of parametric memory, which is opaque and static.

This contrast is fundamental. A model’s parametric memory resembles a vast statistical landscape. Some facts are well anchored there, like well-traveled paths that trace themselves, because they appear often in training data. Others, rarer ones, get lost in tall grass and become difficult to access. Nothing guarantees the path leads to the right place, much less that there aren’t two contradictory trails.

Explicit memory, on the other hand, is a consultable library. You add a book, remove another, annotate the margins. It functions like an archive room where each document is dated, identified, and where you can go back to verify. If parametric memory is a blurry landscape carved in stone, explicit memory is an open notebook that can be kept current. It’s what makes updating, traceability, and knowledge governance possible.

It’s precisely this gulf between parametric and explicit memory that researchers began exploring. Petroni et al. (2019) proposed a simple but revealing test: cloze prompts. The idea involves presenting the model with a sentence containing a gap to fill, for example “Paris is the capital of [MASK]”. A human would immediately see “France.” The model must produce the missing word based on what it retained from its training. On certain common facts, it succeeds. But as soon as we move away from obvious facts or touch on rare knowledge, performance drops rapidly. And crucially, the model can’t say where it learned this information, or when it was valid. In short, parametric memory knows how to “recite,” but it doesn’t know how to “cite.”

RAG: miracle solution or necessary patch?

Faced with these limitations, another approach quickly emerged: retrieval-augmented generation, or RAG, described by Lewis et al. (2020).

The idea appears simple but deserves detailed explanation. Instead of relying solely on the model’s parametric memory, we open access to external memory, a queryable document base. The mechanism works in two stages. First, the system translates the user’s question into a query, then searches through a collection of texts (for example, a Wikipedia index or an internal base of scientific articles). It extracts the most relevant passages. Then, these excerpts are inserted directly into the context provided to the model, which can use them to formulate its response.

Put differently, it’s like asking someone very eloquent a question: first, you place a few cards pulled from archives in their hands, then ask them to improvise an answer based on these. The style remains that of the model, but the content is now enriched by documents it didn’t have in internal memory.

This dual mechanism, search then generation, changes many things. It allows updating knowledge without retraining the model, correcting or adding facts as the base evolves, and crucially establishing traceability: we can indicate “this passage draws from such document.” But be careful, traceability doesn’t rhyme with fidelity. Even fed with identifiable sources, the model continues functioning through probabilistic assembly. It can distort, omit, or reinterpret the information provided. Provenance becomes possible, but distortion remains probable.

In other words, where parametric memory is closed and opaque, RAG opens a window to explicit and living memory. But this window remains filtered by the system’s very nature: what comes out is never an exact copy of what goes in.

Lewis et al. showed that this approach significantly improves performance on so-called “knowledge-intensive” tasks, those requiring many precise and varied facts. But they never claimed this transformed an LLM into a full-fledged knowledge base.

It’s an improvement, yes, but at the cost of a workaround: a model that still speaks through probability, but to which we’ve grafted external memory to compensate for its gaps. An effective patch, but a patch nonetheless.

First, because it doesn’t change the model’s deep nature, which continues reasoning through word probability rather than fact verification. This mechanism privileges formal coherence over truth. When a response is correct, it’s the effect of a fortunate alignment of probabilities, not the result of a desire to verify. And when it’s false, it’s not a lie, but the ordinary functioning of the system. Wanting to completely eliminate these failures would amount to demanding these systems be something other than what they are. We can reduce the frequency of errors, never abolish them.

Second, because the ensemble’s solidity still depends on the quality of the patch glued on top: if the document base is incomplete or poorly indexed, the model will generate lacunary or biased responses, simply dressed up with references. And even with excellent sources, nothing guarantees the model will interpret them faithfully.

Finally, because knowledge isn’t integrated into the system’s core. It remains grafted to the side, like an indispensable but precarious crutch that compensates for a weakness without ever resolving it.

What this changes in practice

If RAG is merely a patch, it’s because it fills a structural flaw: the difference between parametric and explicit memory. This distinction is no technical detail. It determines what a model can actually do, and what would be dangerous to entrust to it.

Any attentive user has already noticed this, these theoretical differences translate into daily usage. We observe them in three typical situations.

Updates. In a purely parametric model, knowledge is frozen at the moment of training. If a law changes, if medical data is revised, if a biography is enriched, nothing moves until the model is retrained, a long, costly, and risky operation. With explicit memory, we simply update the source, and the model can use it immediately. It’s the difference between an inscription carved in stone and a page we correct in a file.

Provenance. Fluent discourse isn’t enough in sensitive domains: medicine, law, education, public communication. We must be able to say where information comes from and who authored it. Yet an LLM doesn’t cite its sources, it guesses them statistically. RAG, however, allows linking a response to an identifiable document. It’s the difference between listening to a story told from memory and consulting an archive where we can verify the signature.

Rare cases. Parametric memory favors frequent facts. The more information is widespread in training data, the more likely it is to emerge correctly. But as soon as we enter the “long tail”, rarely cited facts, pointed details, niche knowledge, the machine stumbles. Access to an updated document base reduces this frequency bias: the model no longer relies solely on its statistical memories but draws on precise texts.

For exploring ideas, reformulating text, or popularizing concepts, the model alone may suffice. But as soon as it involves advancing facts or being very precise in specialized domains, it must rely on explicit, verifiable, and updated memory.

A question to keep in mind

The mishap at the Government Information Service wasn’t just a visual incident. It said something deeper: the temptation to mistake a language-producing machine for a knowledge source.

An LLM is not a knowledge base. It excels at unrolling coherent sentences, exploring ideas, reformulating content. But it verifies nothing, cites nothing, doesn’t update itself. Its memory is parametric, opaque, and frozen.

A knowledge base, however, lives through its additions, corrections, and traceability. It doesn’t just recite, it enables citation. It doesn’t merely produce discourse, it guarantees knowledge governance.

Confusing the two means risking applauding a text’s fluidity while believing we hold truth, or commemorating the Resistance with flags that never flew there.

So a question remains for each of us: what do we do with these tools? Do we let them flatter our need for immediate answers, at the risk of settling for coherence without truth, or do we choose to use them as levers for thinking, questioning, and verifying?