Nils Durner's Blog Ahas, Breadcrumbs, Coding Epiphanies

Voice AI Latency

At a recent joint colloquium of the Bitkom working groups Artificial Intelligence, Metaverse and Virtual & Augmented Reality, a recurring pattern presented was speech transcription using either OpenAI Whisper or Nvidia Parakeet: e.g. so the user could talk to avatars that elaborate about different aspects of a Digital Twin (factory floor, in... Read more

Starter Kit for Learning ChatGPT Sans Snake-Oil

The generative-AI gold rush has bred two things in equal measure: astonishing progress and an ocean of low-grade advice. New capabilities ship weekly; yesterday’s “best practice” can be today’s foot-gun. Below is a bookmark list that will serve as a stable launchpad. 1. Begin with primary sources OpenAI’s own material is concise, current, and l... Read more

[UPDATED] OpenAI-based dictation on macOS

In search for a dictation app for macOS that surpases the aged integrated voice dictation by Apple, I came across SpeechCraft. My requirements were: - use OpenAI models, preferably - particularly the new GPT-4o based transcribe models - any target app support - OpenSource, preferably SpeechCraft checks these boxes and suppor... Read more

[UPDATED] Adapters in Apple Intelligence

With reports about Apple Intelligence V1 not having met quality expectations, I wondered if the (adapter-based architecture was to blame: depending on the use-case, a different adapter would be plugged into the base model. A newer report confirms, however, that the new “Apple Intelligence V2” will also use the adapter-based approach: For spec... Read more

OpenAI Codex Web: Practical Notes

The first OpenAI hackathon in the EU was about writing Agents using Agents. We received access to Codex in our kit as part of ChatGPT Pro, but it by now also seems to be available to other paid plans including Plus. Codex in ChatGPT is different from Codex CLI. Notes Codex CLI is powered by the codex-1-mini language model, Codex is powered b... Read more