Prompted by an interview with Jerry Liu of LlamaIndex posted by a colleague, I shared that we have been evaluating this since March or so, with rather poor results esp. when the data basis gets larger. I contributed a patch to add GPT-4 support, but as it turned out, the problems are not with the backend LLM, but IMHO a fundamental design issue with the approach itself - despite the sophistication that the ~120.000 lines of code brings. What we didn’t investigate: different vector databases and/or embeddings selection strategies.