Nils Durner's Blog Ahas, Breadcrumbs, Coding Epiphanies

OpenAI-based dictation on macOS

In search for a dictation app for macOS that surpases the aged integrated voice dictation by Apple, I came across SpeechCraft. My requirements were: - use OpenAI models, preferably - particularly the new GPT-4o based transcribe models - any target app support - OpenSource, preferably

SpeechCraft checks these boxes and supports the OpenAI models gpt-4o-transcribe, gpt-4o-mini-transcribe and Whisper. It works on a bring-your-own-API-key basis, so users need an OpenAI Platform account set up. However, an authentication bug towards the OpenAI API rendered the app non-functional on my machine at first. I submitted this Pull request on Github, but it is unaccepted by the upstream author as of yet. (This bugfix was done using OpenAI Codex, with this initial prompt: “When I press the recording shortcut hotkey, nothing happens. Figure out why. - it worked!).

Using SpeechCraft is simple: once access permissions to the microphone are approved, an indicator is shown in the macOS menu bar. When it’s recording, the indicator turns red, when it’s processing, the indicator turns blue. Dictation is toggled with a customizable hotkey combination.

Quality-wise, even the smaller (but cheaper) gpt-4o-mini-transcribe does not disappoint and works better with German/English tech speak and abbreviations than the Apple dictation. Downside: there is no streaming support, and no progress indicator.