logoalt Hacker News

lordgrenvilletoday at 11:52 AM1 replyview on HN

Almost every comment here is appealing to personal experience. By contrast, OP refers to two studies that compare performance on some kind of standardised test over a range of models.

Can't speak to how good those tests are, but they can't be worse than anecdotal evidence for something as vague/subjective as LLM performance.


Replies

bhytoday at 12:10 PM

But the studies are in 2024 and 2025. They don’t apply to current Claude models.