Quick recap on recent advancements in Document-processing below.
- Following up on the huge context size of the Llama 2 code model (> 100K), contributors of non-profits EleutherAI and Nous-Research have expanded stock Llama 2 attention windows to 128K tokens
- Note that this size is only very comparable to OpenAI GPT, Claude etc. because the Tokenizers are different. I have seen 1/4 size variations with a quick check (Llama 2 vs. Aleph Alpha, IIRC).
- No real analysis done wrt. to actual accuracy. The paper only reports Perplexity, meaning how confident the model is.
- (this should be enough to directly load dozens of pages into the LLM; “stuffing method”;)
- In PDFTriage: Question Answering over Long, Structured Documents researchers from Stanford + Adobe Research describe LLM Tool-use to analyze PDF documents. (“Structured” not in the sense of “unstructured” vs “structured data”, but more in the “Tagged PDF” sense).
- Comp.Sec. luminary Bruce Schneier prompted Claude about one of his books: LLM Summary of My Book Beyond Fear. Verdict: “The summary is pretty accurate, and so are the criticisms.”. He doesn’t seem to have uploaded the whole book, though: “Of course, this only works with older books that the LLM has ingested” – so this is likely not a testament of Claude 2’s document summarization abilities.
- Chain of Density Prompting is making waves for summarization tasks. The idea has been ported to Claude, but the method needed adaptation.
- Language Modeling Is Compression. Reception on Twitter is controversial.