Why One Search Algorithm Is Never Enough: A Taxonomy of AI Memory Retrieval
BM25, HNSW, GraphRAG, temporal decay - a practical breakdown of the search paradigms powering AI memory systems, organized by the failure mode each one solves.
BM25, HNSW, GraphRAG, temporal decay - a practical breakdown of the search paradigms powering AI memory systems, organized by the failure mode each one solves.
Most developers optimize model choice, context size, and output quality, but never notice the 20-35% of their bill that disappears with one config flag. Here's what prompt caching actually is, how five major providers implement it differently, and what happens when your tooling silently breaks it.
We went from nomic-embed-text to OpenAI's text-embedding-3-small and thought we'd upgraded. Turns out we'd moved from bad to mediocre. Here's the full landscape of self-hosted embedding models in 2026, organized by what you can actually run on your hardware.
The arms race to 1M context windows assumes more context is better. Research from Chroma proved it makes things worse - and labs have been expanding context windows ever since.