Nils Durner's Blog Ahas, Breadcrumbs, Coding Epiphanies

GPT-4o image generation

GPT-4o for image generation has been released - as part of ChatGPT and Sora. It supersedes the Dall-E 3 model, which was originally released in October 2023, but remains the best OpenAI image generation model available via their API.

Most notable for me, 4o image generation not just supports text prompts (like Dall-E did), but also image prompts. Gemini has added that capability recently as well, but I find 4o’s capabilities (or: steerability, as it’s advertised) to be better. Here are two extractive examples, with errors (only) in the small details:

Extracting a product from a scene, rotating it

Extracting a motif from a backpack

And it may be usable to translate slides as well:
EN -> DE slide translation
Notice however how our profile photos got substituded!

[Update 2025-03-27] 4o can create technical visualizations, at least simple ones. Steven Heidel of OpenAI shared this diagram, but without the prompt. I tried the first of my practice cases from my article on process visualization with a slightly modified text prompt and it generally did work, with just minor flaws. The trick is to instruct ChatGPT to use the “Imagegen” tool.