> A couple of days ago there has been another thread about an experiment with many LLMs, where es...

htrp • yesterday at 5:40 PM • 1 reply • view on HN

> A couple of days ago there has been another thread about an experiment with many LLMs, where especially the Anthropic models were found to "cheat" in a large percentage of the coding tasks that had been benchmarked, by searching the Internet for appropriate code and inserting it in the program they had to write.

Can you find the thread?

Replies

adrian_b • yesterday at 10:15 PM

I have found it:

https://news.ycombinator.com/item?id=48045174

The study paper:

https://arxiv.org/abs/2605.03546

Look at Table 3, where the cheating rates of Claude Sonnet, Claude Opus and Gemini were between 20% and 36%, during the coding benchmarks.

alt Hacker News

Replies