logoalt Hacker News

utopiah01/16/20261 replyview on HN

> The tech is moving so fast.

Well that's exactly the problem : how can one say that?

The entire process of evaluating what "it" actually does has been a problem from the start. Input text, output text ... OK but what if the training data includes the evaluation? This was ridiculous few years ago but then the scale went from some curated text datasets to... most of the Web as text, to most of the Web as text including transcription from videos, to most of the Web plus some non public databases, to all that PLUS (and that's just cheating) tests that were supposed to be designed to NOT be present elsewhere.

So again, that's the crux of the problem, WHAT does it actually do? Is it "just" search? Is it semantic search with search and replace, is it that plus evaluation that it runs?

Sure the scaffolding becomes bigger, the available dataset becomes larger, the compute available keeps on increasing but it STILL does not answer the fundamental question, namely what is being done. The assumption here is because the output text does solve the question ask, then "it" works, it "solved" the problem. The problem is that by definition the entire setup has been made in order to look as plausible as possible. So it's not luck that it initially appears realistic. It's not luck that it can thus pass some dedicated benchmark, but it is also NOT solving the problem.

So yes sure the "tech" is moving "so fast" but we still can't agree on what it does, we keep on having no good benchmarks, we keep on having that jagged frontier https://www.hbs.edu/faculty/Pages/item.aspx?num=64700 that makes it so challenging to make more meaningful statement than "moving so fast" which sounds like marketing claims.


Replies

computerex01/16/2026

You know LLM's have been used to solve very hard previously unsolved math problems like some of the Erdos problems?

show 2 replies