"Opus 4.5 feels to me like" The article is fine opinion but at what point are we going t...

Papazsazsa • last Tuesday at 6:06 PM • 2 replies • view on HN

"Opus 4.5 feels to me like"

The article is fine opinion but at what point are we going to either:

a) establish benchmarks that make sense and are reliable, or

b) stop with the hypecycle stuff?

Replies

NewsaHackO • last Tuesday at 6:09 PM

>establish benchmarks that make sense and are reliable

How aren't current LLM coding benchmarks reliable?

➕ show 1 reply

cardine • last Tuesday at 7:05 PM

> make sense and are reliable

If you can figure out how to create benchmarks that make sense, are reliable, correlate strongly to business goals, and don't get immediately saturated or contorted once known, you are well on your way to becoming a billionaire.

alt Hacker News

Replies