Why One Search Algorithm Is Never Enough: A Taxonomy of AI Memory Retrieval

Every AI memory system eventually runs into the same problem: retrieval is hard, and no single algorithm handles all the cases well. The history of search is really a history of each approach failing in a specific way, and someone building something new to compensate.

What follows is a taxonomy of the major paradigms, organized not by alphabet or age, but by the failure mode each one was invented to solve.


1. Lexical Search - The Reliable Baseline

Core algorithms: TF-IDF, BM25, inverted index

The oldest family. Documents are indexed by the words they contain, and queries match by exact term overlap, weighted by frequency. BM25 (Best Match 25) is the gold standard here. It accounts for document length and term frequency in a way that outperforms naive word counting.

What it solves: Exact recall. If you know the word and the word is in the document, BM25 will find it.

Where it breaks: Synonym blindness. “Car” and “automobile” are different strings. BM25 returns zero results for one if only the other appears. It has no concept of meaning, only characters.

Used in: ElasticSearch, Solr, most document search systems built before 2019.


2. Vector / Semantic Search - Teaching Search to Understand Meaning

Core algorithms: HNSW, IVF, ScaNN, DiskANN

Text gets embedded into a high-dimensional vector space where similar meanings land near each other geometrically. Retrieval becomes a nearest-neighbor geometry problem: find the vectors closest to the query vector.

The brute-force version (compare query to every stored vector) is exact but slow. Production systems use Approximate Nearest Neighbor (ANN) algorithms that trade a tiny accuracy loss for massive speed:

  • HNSW (Hierarchical Navigable Small Worlds) - builds a multi-layer graph; the dominant production algorithm, used in Pinecone, Weaviate, Chroma
  • IVF (Inverted File Index) - clusters vectors into buckets, searches only likely clusters; used in FAISS
  • ScaNN (Google) - optimized for dot-product similarity at scale
  • DiskANN - HNSW variant for datasets too large to fit in RAM

What it solves: Synonym and paraphrase blindness. “Car” and “automobile” land near each other in vector space even if they never co-occur.

Where it breaks: Exact keyword recall. A vector search for “BM25” may surface tangentially related documents about “retrieval algorithms” while missing the exact document that defines BM25. Numbers and IDs are especially problematic.

Used in: Pinecone, Weaviate, Qdrant, pgvector, mem0.


3. Graph-Based Search - Reasoning About Relationships

Core algorithms: Graph traversal (BFS/DFS), PageRank, GraphRAG, Cypher/SPARQL queries

Memory is stored as a knowledge graph: entities as nodes, relationships as edges. Retrieval traverses edges rather than ranking documents by similarity. To find “what does Justin know about vector search?”, you traverse from the Justin node through a “knows about” edge to find connected concepts.

GraphRAG (Microsoft) extends this by clustering the graph into communities, summarizes each at multiple levels of abstraction, and retrieves based on community relevance. This enables synthesis across many documents that wouldn’t surface in any single similarity search.

What it solves: Relationship reasoning. Vector search tells you which memories are similar; graph search tells you which memories are connected. The difference matters when you need to answer “how does X relate to Y?” rather than “what’s most like X?”

Where it breaks: Unstructured text. Building a knowledge graph requires either manual curation or expensive LLM-based extraction. It doesn’t generalize as easily as embedding a document.

Used in: Neo4j, mem0’s graph layer, GraphRAG, Zep.


4. Temporal and Importance-Weighted Search - Memory That Forgets Intelligently

Core mechanisms: Recency decay, access frequency boosting, importance scoring, reflection/consolidation

In human memory, not all memories are equal. Recent events are more accessible. Frequently recalled memories become stronger. This family of algorithms applies the same logic to AI systems.

  • Recency decay - an exponential decay function downgrades older memories over time, modeling the Ebbinghaus forgetting curve
  • Access frequency - memories retrieved often get boosted, strengthening useful information
  • Importance scoring - memories flagged as significant at write time (by rule or model judgment) stay surfaced longer
  • Reflection / consolidation - periodically merge and compress older memories into higher-level summaries, reducing noise while preserving signal (from the Generative Agents paper by Park et al., 2023)

The problem it solves: Relevance drift. Without decay, a memory from three years ago about an old preference competes equally with a memory from yesterday about a changed preference. Temporal weighting keeps context current.

The failure mode: Decay rates are hard to tune. Something important mentioned once, long ago, may decay below the retrieval threshold even when it’s highly relevant.

Used in: Generative Agents (Stanford/Google, 2023), mem0, MemGPT/Letta.


5. Hybrid and Multi-Stage Search - The Production Reality

Core patterns: BM25 + vector fusion, MMR, retrieve-then-rerank, ColBERT, HyDE, RAPTOR

No single paradigm survives contact with production. Real systems layer multiple signals and refine in stages.

  • Hybrid search (BM25 + vector): Both run in parallel; scores merge via Reciprocal Rank Fusion (RRF) or learned weighting. Gets exact recall from BM25 and semantic recall from vectors simultaneously.
  • MMR (Maximal Marginal Relevance): After retrieval, penalize results too similar to already-selected ones. Reduces redundancy, improves coverage.
  • Two-stage retrieve-then-rerank: ANN retrieves a cheap top-100 candidate set; a cross-encoder reranker (Cohere Rerank, ColBERT) does expensive but highly accurate scoring on that shortlist.
  • HyDE (Hypothetical Document Embedding): Generate a hypothetical ideal answer with the LLM first, embed that, then search with it. Closes the gap between short query distributions and long document distributions.
  • RAPTOR: Recursively summarize documents into a tree; retrieve from multiple levels of abstraction depending on the query’s specificity.

What it solves: All of them. BM25 catches what vector search misses. Reranking corrects what ANN gets wrong. Hybrid is how you get reliable retrieval across all query types.

Used in: Production RAG at most major AI companies; Weaviate, Vespa, Cohere.


6. The Attention Alternative

As context windows expand toward millions of tokens, “retrieval” becomes a spectrum. The model’s own attention mechanism can function as implicit retrieval when everything fits in context. No external index required.

Context windows are expensive though, and models have a “lost-in-the-middle” problem: information buried in the middle of a long context gets under-attended. Strategic placement of important memories at the beginning or end of context is itself a form of retrieval optimization.


The Decision Tree

NeedReach For
Exact keyword or ID matchBM25 / lexical
Semantic similarity, paraphrase-robustHNSW vector search
Relationship reasoning between entitiesKnowledge graph traversal
Recency-sensitive, current contextDecay-weighted scoring
Production robustness across query typesHybrid BM25 + vector + rerank
Cross-document synthesisGraphRAG or RAPTOR
Diverse, non-redundant resultsMMR post-processing

For any serious AI memory system, you need most of these, layered. The question is which layers matter most for your specific retrieval patterns, and which failure modes you can afford to accept.

← All posts