logoalt Hacker News

tavavextoday at 2:45 AM0 repliesview on HN

I think you're confusing OP for the people who claim that there is zero functional difference between an LLM and a search engine that just parrots stuff already in it. But they never made such a claim. Here, let me try: the simplest explanation for how next token estimation leads to a model that often produces true answers is that for most inputs, the most likely next token is true. Given their size and the way they're trained, LLMs obviously don't just ingest training data like a big archive, they contain something like an abstract representation of tokens and concepts. While not exactly like human knowledge, the network is large and deep enough that LLMs are capable of predicting true statements based on preceding text. This also enables them to answer questions not in their training dataset, although accuracy obviously suffers the further you deviate from known topics. The most likely next token to any question is the true answer, so they essentially ended up being trained to estimate truth.

I'm not saying this is bad or underwhelming, by the way. It's incredible how far people were able to push machine learning with just the knowledge we have now, and how they're still making process. I'm just saying it's not magic. It's not something like an unsolved problem in mathematics.