Document Processing

Quick recap on recent advancements in Document-processing below.

Following up on the huge context size of the Llama 2 code model (> 100K), contributors of non-profits EleutherAI and Nous-Research have expanded stock Llama 2 attention windows to 128K tokens
1. Note that this size is only very comparable to OpenAI GPT, Claude etc. because the Tokenizers are different. I have seen 1/4 size variations with a quick check (Llama 2 vs. Aleph Alpha, IIRC).
2. No real analysis done wrt. to actual accuracy. The paper only reports Perplexity, meaning how confident the model is.
3. (this should be enough to directly load dozens of pages into the LLM; “stuffing method”;)
In PDFTriage: Question Answering over Long, Structured Documents researchers from Stanford + Adobe Research describe LLM Tool-use to analyze PDF documents. (“Structured” not in the sense of “unstructured” vs “structured data”, but more in the “Tagged PDF” sense).
Comp.Sec. luminary Bruce Schneier prompted Claude about one of his books: LLM Summary of My Book Beyond Fear. Verdict: “The summary is pretty accurate, and so are the criticisms.”. He doesn’t seem to have uploaded the whole book, though: “Of course, this only works with older books that the LLM has ingested” – so this is likely not a testament of Claude 2’s document summarization abilities.
Chain of Density Prompting is making waves for summarization tasks. The idea has been ported to Claude, but the method needed adaptation.
Language Modeling Is Compression. Reception on Twitter is controversial.