Nils Durner's Blog Ahas, Breadcrumbs, Coding Epiphanies

Updated: GPT-4o

(Updated on Jul 18)

So OpenAI have (pre-)released a new member of the GPT-4 family. This has been seen around the web under the guise of “gpt2-chatbot” or “GPT-4 Lite”, and is now available on the free tier of ChatGPT (with limits) and via API. It’s important to realize that it’s a whole new model, in several ways. Techniques learnt & developed for the other GPT-4 members may not work as well as they do with the others.

In any case, the usual warnings still apply, but there is a new one: ChatGPT lists sources that do exist, but confabulates what is written there. As a result, the ChatGPT system with web browsing can be worse than the core GPT-4o model without.

Example prompt:

Nils is one of the contributors to GNUnet. What’s his full name?

GPT-4o (via my OAI Chat; at temperature 0):

Nils Durner

… which is correct. ChatGPT however:

The full name of Nils, one of the contributors to GNUnet, is Nils Faerber (Wikipedia) (Gnunet)

… with neither link mentioning either “Nils” or “Faerber”. (I don’t know Nils Faerber either). So the remark “Searched 6 sites” can be misleading to users, and hints at a systemic problem to experts.

Nerdy detail: I’ve observed large fluctuations in model perplexity, which suggests that the model’s reliability in producing consistent results can vary significantly at times. As it stands, caution derived from insights with other OpenAI GPT models remains still valid e.g. that logprobs might not be reliable indicators of the model ‘hallucinating’ or generating incorrect content, and also seed effectiveness is not guaranteed.

The model card has not been published yet, perhaps this will shed some light.

https://www.forbes.com/sites/kateoflahertyuk/2024/05/13/apples-new-chatgpt-deal-heres-what-it-means-for-your-iphone/. I never care about Apple Leaks, but this could help contextualize „GPT-4 Lite“.

https://huggingface.co/spaces/vectara/leaderboard.