Notebook / LLM Memory & Retrieval

LLM Memory & Retrieval

How LLMs find information, why they forget between sessions, and how to solve that.

The memory problem

LLMs don't remember anything by default. Every session starts from zero.

Ask an LLM to book a trip to Taipei. Close the session. Open a new one. Say "buy the ticket." It has no idea what trip you're talking about. The previous conversation is gone.

This happens because LLMs are stateless. They process the text in front of them — the context window — and produce output. Nothing persists between sessions unless you explicitly save it somewhere.

Analogy: An LLM is like a brilliant contractor who shows up every morning with total amnesia. They can do incredible work if you brief them — but yesterday's briefing is gone. You need to leave notes on the desk.

What survives and what doesn't

The session boundary is the key concept.

Session 1
You: "Analyze this transcript"
LLM reads your instructions
LLM reads the transcript
LLM writes a summary
LLM updates related pages
Session ends. LLM's internal state = gone.
Survives
Instructions file
Source documents
Output files
Dies
LLM's "understanding"
Conversation context
Internal vectors
Session 2
You: "What do we know about X?"
LLM reads same instructions
LLM reads the output from last time
LLM answers from those files
Works because the files persisted, not because it "remembers."
Key insight: The LLM doesn't remember — it re-reads. Session 2 works because someone saved files that the LLM can read again. The files ARE the memory. The LLM is just a reader.

Three approaches to persistence

Different ways to leave notes on the desk for tomorrow's amnesiac contractor.

approach 1

File-based memory

Write instructions and outputs to files. The LLM reads them at the start of each session. You control exactly what persists and how it's structured.

Examples: README files, CLAUDE.md, structured markdown wikis, config files. Human-readable, version-controllable, transparent.

approach 2

Vector-based memory

Embed all conversations and documents into a vector database. At the start of each session, query the database for relevant context and preload it.

Examples: Pinecone, Chroma, Weaviate. Automatic, scales to millions of chunks, but opaque — you don't control what gets retrieved.

approach 3

Platform memory

The LLM platform itself stores memory entries between sessions. You tell it "remember this" and it saves a note internally.

Examples: ChatGPT's memory feature, Claude's memory files. Convenient but limited — you're trusting the platform to decide what's relevant.

File-basedVector-basedPlatform
ControlFullLowMedium
EffortHigh (manual curation)Low (automatic)Low
TransparencyTotalLow (similarity scores)Medium
ScaleHundreds of pagesMillions of chunksLimited
InfrastructureJust filesDatabase + embedding pipelineNone
PortabilityWorks anywhereVendor-lockedVendor-locked

The two knobs: retrieval vs behavior

These are independent controls. You can have either without the other, or both.

When you set up any LLM-based system, you're turning two separate knobs:

What to look at

The retrieval problem. Out of all available information, which pieces does the LLM read?

vector searchindex lookupfile selection

How to process it

The behavior problem. Once the LLM has the information, what should it do with it?

instructionsschemasrules
What to look at (retrieval)How to process (behavior)
File-based systemYou pick the file, or LLM reads an indexInstructions file defines the rules
Vector system (RAG)Vector DB picks the chunksStill needs instructions/prompts
No systemUser pastes text into chatWhatever the LLM defaults to
Common mistake: Assuming a vector database replaces instructions. Even with perfect retrieval, the LLM still needs to know: what format should the output be? What rules apply? What should it never do? Retrieval gets the right data in. Instructions tell it what to do with that data. You need both.

Vectors inside the LLM vs a vector database

These sound the same but are completely different things.

Vectors inside the LLM

Every token gets converted to a vector during processing. Attention is computed in vector space. This is just how neural networks work.

Ephemeral. These vectors exist during your conversation and are gone when the session ends. You don't control or see them.

A vector database

A separate piece of infrastructure you build and host. You embed your documents, store the vectors on disk, and query them later.

Persistent. Survives session restarts. You decide what goes in. It's like Postgres, but for similarity search instead of exact queries.

Analogy: Saying "the LLM uses vectors" is like saying "my car uses pistons." True, but you don't manage the pistons. A vector database is a separate engine you'd buy, install, and maintain.
Part of a series: Wiki vs Vector Database covers the fundamental difference. Knowledge System Architecture covers designing the full system.