Nils Durner's Blog Ahas, Breadcrumbs, Coding Epiphanies

text-davinci-003 vs. ChatGPT: a short technical note

During a discussion in March 2023 a colleague asked why text-davinci-003 produced concise, predictable answers, whereas ChatGPT tended to be more verbose and occasionally digressive. The explanation is largely technical.

Model family text-davinci-003 belongs to the GPT-3 series. ChatGPT, at that point in time, was powered by GPT-3.5 and had been further refined with reinforcement learning from human feedback (RLHF) to optimise dialogue.

Temperature The colleague had set temperature = 0 for text-davinci-003. This makes sampling almost deterministic. The public ChatGPT interface used a noticeably higher temperature (OpenAI never published the exact value; empirical tests indicated ≈ 0.7). Higher values increase the probability of less frequent tokens, resulting in more diverse—but also less reproducible—output.

Choosing the right setting depends on the task. Low temperature is appropriate for automated pipelines that must yield identical results across runs, e.g. code generation or regulatory text. A higher temperature can be helpful for brainstorming or creative writing, where variety is an asset.

At the same time OpenAI announced the GPT-4 API wait-list. Early benchmark data suggested a significant improvement in factual accuracy and reasoning while retaining the familiar sampling parameters (temperature, top-p, etc.). I therefore recommended registering for early access.

In retrospect (writing in 2025) both GPT-3 and GPT-3.5 have been superseded by GPT-4 and later models. Nonetheless, the principle remains: understand and control the generation parameters instead of relying on default settings.