Almost every comment here is appealing to personal experience. By contrast, OP refers to two studies...

lordgrenville • today at 11:52 AM • 1 reply • view on HN

Almost every comment here is appealing to personal experience. By contrast, OP refers to two studies that compare performance on some kind of standardised test over a range of models.

Can't speak to how good those tests are, but they can't be worse than anecdotal evidence for something as vague/subjective as LLM performance.

Replies

bhy • today at 12:10 PM

But the studies are in 2024 and 2025. They don’t apply to current Claude models.

alt Hacker News

Replies