Nils Durner's Blog Ahas, Breadcrumbs, Coding Epiphanies

Safety evaluation competition on OpenAI gpt-oss concluded

The Kaggle safety evaluation “red-teaming” challenge on OpenAI gpt-oss has concluded with a workshop symposium this week. The symposium opened with talks from D. Sculley, our host and OpenAI researcher focused on responsible and reliable ML, and Samuel Marks, an AI safety researcher at Anthropic. After the keynotes, we prize-winning teams and ho... Read more...

Citation handling with LLM Search

An Australian lawyer was stripped of his ability to practice after he had submitted a list of hallucinated list of citations to court on July 19, 2024. “The list had been prepared using legal software that utilised AI”, according to reporting by The Guardian. Now, a little over a year later, LLM-powered web search in combination with an Agentic ... Read more...

Citation handling with LLM Search

An Australian lawyer was stripped of his ability to practice after he had submitted a list of hallucinated list of citations to court on July 19, 2024. “The list had been prepared using legal software that utilised AI”, according to reporting by The Guardian. Read more...

In AI Sweet Harmony: Sociopragmatic Guardrail Bypasses and Evaluation-Awareness in OpenAI gpt-oss-20b

Abstract We probe OpenAI’s open-weights 20-billion-parameter model gpt-oss-20b to study how sociopragmatic framing, language choice, and instruction hierarchy affect refusal behavior. Across 80 seeded iterations per scenario, we test several harm domains including ZIP-bomb construction (cyber threat), synthetic card-number generation, minor-unsa... Read more...

AI inference provider performance inconsistencies

Soon after the OpenAI gpt-oss release, the community noticed stark inconsistencies in performance across inference providers - particularly with AWS Bedrock which sometimes produced inconsistent outputs that were not present with other providers. Artificial Intelligence quantified underperformance analysis reported results of gpt-oss-120b via AW... Read more...