> why are we concluding that bigger models and more data = more hallucination? That’s not what ...

an0malous • yesterday at 12:49 PM • 5 replies • view on HN

> why are we concluding that bigger models and more data = more hallucination?

That’s not what your quotes said. They said bigger models = plateau in intelligence, nothing about more data or increased hallucinations

The relevant quote for what you’re talking about would be:

> It’s been proven that when a model is trained on large volumes of highly factual and non-theoretical data, it learns to always have an answer.

So there’s two separate claims: 1) bigger models have plateauing results 2) models trained on larger amounts of factual data have a higher hallucination rate

I’m pretty sure #1 is well known, I think OpenAI’s own research on scaling laws showed diminishing returns on parameter count and training data volume years ago. I don’t know what the support for #2 is besides for the actual post contents.

Replies

jmalicki • yesterday at 1:23 PM

I find these internet arguments talking about LLMs as if they are trained by reading the internet to be wild.

Yes, pretraining still exists. But for the past few years, pretraining by reading the internet is just the initial bootstrapping of LLM training. The RL training they get from bespoke training data, with very very different characteristics than what these armchair analyses claim, dominates these days.

➕ show 5 replies

themgt • yesterday at 3:20 PM

That’s not what your quotes said. They said bigger models = plateau in intelligence, nothing about more data or increased hallucinations ... I’m pretty sure #1 is well known

Well known in a multiverse branch where Fable was a dud?

➕ show 1 reply

coffeefirst • yesterday at 3:11 PM

Yeah #2 may be incidental. Suppose one lab focused on bigger, and another on reinforcement training geared towards factual accuracy over sycophancy. You could easily wind up with a model from the second lab that is less powerful but more accurate.

I can’t prove it but I suspect there’s a bit of that going on.

➕ show 1 reply

ifwinterco • yesterday at 8:52 PM

#2 is not that surprising from first principles if the way you made the bigger model was by feeding it poorer quality training data because it’s the only way you can get enough

alt Hacker News

Replies