RAG in practice

Studies by Salesforce Research and Google Deepmind, as well as own experiments, have previously cast fundamental doubt on RAG. Now, Richard Meng came forward and shared practical confirmation:

We’ve spoken with 30 companies who developed RAG-based chatbots on PDF documents. Every single one has failed

The problems he shares are familiar:

1) In vector space, “non-dairy products” is often closer to “milk” than “meat,” this is a fundamental flaw of vector embedding search because they’re very lossy. 2) Splitting documents into smaller chunks disrupts coherence, breaking cross-references and context. […]

And he concludes:

If your documents are small: just load them directly into the LLM context. […] Chatting on documents must be redesigned.

Documents don’t have to be small, actually: Anthropic have long supported 200K contexts (less on AWS and Google LLM context lengths are specc’ed at 2M.