logoalt Hacker News

dpoloncsaklast Thursday at 2:23 PM1 replyview on HN

> Knowing that ChatGPT output good tokens last tuesday but Sonnet didn't does not help us know much about the future of the tools on general.

Isnt that exactly what is going to help us understand the value these tools bring to end-users, and how to optimize these tools for better future use? None of these models are copy+pastes, they tend to be doing things slightly differently under the hood. How those differences affect results seems like the exact data we would want here


Replies

ehntoyesterday at 3:29 AM

I guess I disagree that the main concern is the differences per each model, rather than the overall technology of LLMs in general. Given how fast it's all changing, I would rather focus on the broader conversation personally. I don't really care if GPT5 is better at benchmarks, I care that LLMs are actually capable of the type of reasoning and productive output that the world currently thinks they are.

show 1 reply