logoalt Hacker News

sdenton4yesterday at 7:28 PM1 replyview on HN

Doing great on public datasets and underperforming on private benchmarks is not a good look.


Replies

Deegyyesterday at 7:46 PM

Is it though? Do we still have the expectation that LLMs will eventually be able to solve problems they haven't seen before? Or do we just want the most accurate auto complete at the cheapest price at this point?

show 1 reply