I find these internet arguments talking about LLMs as if they are trained by reading the internet to...

jmalicki • yesterday at 1:23 PM • 5 replies • view on HN

I find these internet arguments talking about LLMs as if they are trained by reading the internet to be wild.

Yes, pretraining still exists. But for the past few years, pretraining by reading the internet is just the initial bootstrapping of LLM training. The RL training they get from bespoke training data, with very very different characteristics than what these armchair analyses claim, dominates these days.

Replies

MattRogish • yesterday at 3:22 PM

I'd have to imagine there are wildly diminishing marginal returns to additional SFT/post-training passes.

There are a bounded number of (useful) derivations/combinations of Duff's device.

If Frontier Labs wish to reduce hallucinations on factual things, they will have to hire people (or the data providers will need to) to do fundamental research above and beyond what is available in extant literature and the web. IE if the LLMs want to lower precision error, they need to go out and actually find more expertise. If the wikipedia page for Pompey lacks data, where are they going to get it from? How would they even _identify_ that the page has holes?

Yes, they can digitize more books but that is untrustworthy data - if there were enough eyeballs on a particular work, it would be in the internet. If it's not, they'd need to hire the experts themselves. They need expert reviewers in virtually every interesting topic, which fundamentally is an intractable problem, especially since things change all the time. Maybe even uninteresting topics, too?

I dunno, it doesn't seem to me "more data" is the magic bullet here. Yeah, it will "help" but we're already on the flat part of the S shaped curve.

My take from trying to understand this stuff is some sort of algorithmic improvement is necessary to get another step change in how well LLMs perform in this area. I could be wrong!

➕ show 2 replies

jazzdev • yesterday at 11:35 PM

In the last few weeks Claude (Sonnet) has told me “I don’t know” 3 different times. That seems like the solution to hallucinations and it’s already happening.

mcphage • yesterday at 2:41 PM

Where do they get the bespoke training data from? And how much? I don’t really know anything about this.

➕ show 3 replies

jgalt212 • yesterday at 5:41 PM

Outside of games and coding generating enough valid examples and counter-examples to harness the power of RL is cost prohibitive.

➕ show 1 reply

dominotw • yesterday at 4:04 PM

let me take down armchair analysis with my armchair analysis

alt Hacker News

Replies