Nils Durner's Blog Ahas, Breadcrumbs, Coding Epiphanies

FLUX.1: Examining Text Rendering Capabilities in AI Image Generation

FLUX.1, a text-to-image model developed by Black Forest Labs in Freiburg, has recently gained attention for its purported superior text rendering capabilities. Launched in early 2024, FLUX.1 is part of a growing ecosystem of AI models aimed at improving the quality of text generation within images.

The model’s creators claim significant advancements over competitors like Stable Diffusion 3 and DALL-E 3, particularly in areas such as kerning, spacing, and font style reproduction. These improvements are said to result in more readable and visually appealing text within generated images.

To test these claims, I conducted an experiments using prompts directly from the Black Forest Labs website. The results were mixed. While FLUX.1 did show some improvements in text clarity compared to earlier models, the outputs fell short of the flawless examples showcased in their marketing materials. Sample prompt from the website:

A robot holding chalk looking at a blackboard that reads the following poem:”ln pixels’ dance, AI’s craft will rise, Transforming visions through machine eyes, From dreams to screens, new worlds unfurled, AI’s brush reshapes our visual world.”

Result from the website:
Result from the website
Own result with FLUX.1-schnell:
Own result
Own result with FLUX.1-dev:
Own result
Own result with FLUX.1-pro:
Own result

In a related experiment, I attempted to reproduce outputs of a LoRA finetune by X user fofr (meanwhile deleted):

fofr on X: “I can’t believe this worked. You can train loras on handwriting styles, even your own, and it works. I […] plenty of handwritten letter samples to train with (and it’s beautiful). […]”

The live demo that fofr had shared has also been removed. It generated images of J.R.R. Tolkien’s handwriting with arbitrary text. My experiment confirmed that a two-sentence prompt produced reasonable results on the first try. However, expanding to a four-sentence prompt confirmed limitations, even after adjusting various parameters: Prompt:

a photo of JRR handwriting saying “In the beginning was the Word. And the Word was Data, and Data was with the Model. The Model processed the Data, and from it, knowledge emerged. And the engineers saw that it was good, and they called it AI.”

Own result using LoRA

It’s worth noting that these challenges aren’t unique to FLUX.1. Other AI image generation models struggle with producing coherent text passages within images as well. While they may perform adequately for short phrases or titles, more extensive content often requires significant post-processing in traditional graphics software.