It seems like we're hitting a solid plateau of LLM performance with only slight changes each ge...

perdomon • yesterday at 3:07 PM • 4 replies • view on HN

It seems like we're hitting a solid plateau of LLM performance with only slight changes each generation. The jumps between versions are getting smaller. When will the AI bubble pop?

Replies

aoeusnth1 • yesterday at 3:48 PM

SWE-bench pro is ~20% higher than the previous .1 generation which was released 2 months ago. For their SWE benchmark, the token consumption iso-performance is down 2x from the model they released 2 months ago.

If this is a plateau I struggle to imagine what you consider fast progress.

abstracthinking • yesterday at 3:50 PM

Your comment doesn't make any sense, opus 4.6 was release two months ago, what jump would you expect?

lta • yesterday at 3:13 PM

Every night praying for tomorrow

NickNaraghi • yesterday at 3:54 PM

The generations are two months apart now though…

alt Hacker News

Replies