logoalt Hacker News

Supermanchotoday at 1:55 PM1 replyview on HN

> I found my interactions with Fable to be extremely impressive; it made other models, including GPT 5.5 and Opus 4.8, feel small and dumb.

> Anthropic models have consistently been top-scoring in BullshitBench[0]

eyeroll I find that Anthropic models feel big and dumber.

https://www.endorlabs.com/research/ai-code-security-benchmar... puts Fable 5th, which seems about right to me.

I'm interested in code utility and correctness, even if the majority of AI use is not focused on that.


Replies

airstriketoday at 3:57 PM

I think this just proves anyone can pick a benchmark that supports their point so maybe we shouldn't use treat them as evidence at all.