"RAG is Dead, Context Engineering is King" — with Jeff Huber of Chroma
This Latent.Space newsletter features a discussion with Jeff Huber of Chroma, challenging the current "RAG" paradigm in AI and advocating for a shift towards "Context Engineering." The conversation explores the nuances of building production-ready AI applications, emphasizing the importance of retrieval strategies and managing context effectively as context windows grow.
-
The Death of RAG: The newsletter argues that "RAG" (Retrieval-Augmented Generation) is an oversimplified and misleading term, advocating for a more granular approach focused on retrieval primitives and context management.
-
Context Engineering is King: This approach emphasizes the importance of curating and optimizing the information fed into LLMs, particularly as context window sizes increase, to avoid "context rot" (performance degradation with increasing input tokens).
-
Retrieval Strategies: The discussion highlights practical tips for improving retrieval, including hybrid recall, re-ranking, and respecting context limits, as well as the importance of golden datasets for continuous evaluation.
-
Evolving Search Infrastructure for AI: The newsletter stresses the differences between traditional search and "modern search for AI," noting that AI systems and consumers can digest orders of magnitude more information and the need for modern distributed systems.
-
Team culture and brand: Importance is placed on building a team of people who "ship your culture" and who are passionate about a well-crafted brand.
-
"RAG" is a confusing abstraction: It conflates retrieval and generation, hindering critical thinking about AI system design.
-
Context Rot is a real problem: LLM performance degrades as context windows get larger, making careful context selection essential.
-
Brute force with re-ranking can be effective: Using LLMs as re-rankers to curate from a pool of candidates can be cost-effective.
-
Code indexing is evolving: Regex search, embedding, and chunk rewriting are all valuable tools, with the right blend depending on the expertise of the query writer.
-
Generative Benchmarking is key for eval: Creating QA pairs from chunks of data to test various changes to retrieval strategies is important.