Studies by Salesforce Research and Google Deepmind, as well as own experiments, have previously cast fundamental doubt on RAG. Now, Richard Meng came forward and shared practical confirmation:
We’ve spoken with 30 companies who developed RAG-based chatbots on PDF documents. Every single one has failed
The problems he shares are familiar:
1) In vector space, “non-dairy products” is often closer to “milk” than “meat,” this is a fundamental flaw of vector embedding search because they’re very lossy. 2) Splitting documents into smaller chunks disrupts coherence, breaking cross-references and context. […]
And he concludes:
If your documents are small: just load them directly into the LLM context. […] Chatting on documents must be redesigned.
Documents don’t have to be small, actually: Anthropic have long supported 200K contexts (less on AWS and Google LLM context lengths are specc’ed at 2M.