Recently, a few open-source tools for converting PDFs, Office documents, and other formats into Markdown have drawn attention. Among these are MarkItDown from Microsoft, Docling from IBM Research, PyMuPDF4LLM, and the Jina AI Reader API. They aim to provide text suitable for downstream tasks, including LLM-driven analysis, without requiring manu... Read more... 14 Dec 2024 - 3 minute read
Quick notes on last week’s foundation model releases: Read more... 12 Dec 2024 - 1 minute read
[Update 2025-07-21: AWS has added Amazon Bedrock API keys. I haven’t tried this myself yet, but this could be a simplification to setting up IAM as described below.] Read more... 21 Jul 2025 (Updated) - 1 minute read
The term “AI Agent” has become increasingly prevalent in discussions about artificial intelligence, yet its meaning remains somewhat ambiguous. This ambiguity stems partly from different conceptualizations of agency across disciplines and languages. A recent LinkedIn discussion, sparked by Maximilian Seeth’s introduction to AI ethics, highlighte... Read more... 01 Dec 2024 - 4 minute read
As I experimented with the Microsoft Presidio live demo for PII, I found that neither model does very well with German language when the objective is to also identify organization names. Cloning the HuggingFace space that hosts this demo allows one to enable use of other models (through setting the environment variable ALLOW_OTHER_MODELS = 1), b... Read more... 24 Nov 2024 - 1 minute read