logoalt Hacker News

wolttamyesterday at 12:32 PM12 repliesview on HN

> it is clear that actual intelligence has plateaued significantly.

> Moving forward, the industry cannot continue to train bigger and bigger models since their intelligence not only plateaus but often will get worse

These are wild claims - why are we concluding that bigger models and more data = more hallucination? That’s actually the opposite of what’s been happening over the last couple years. Some models may still hallucinate more but they all hallucinate much less than the original 175B ChatGPT which was smaller and trained on (much) less data than anything current.

Edit: My mention of data comes from this quote:

> A shift is happening among major AI labs, who are becoming increasingly skeptical of endless parameter count and training data scaling

My take on the current situation: it seems clear that the industry has seen that there is still a lot left to squeeze out of sub-1T models. But for that you do need more, high-quality data in the distribution which you want to unlock capabilities for.


Replies

an0malousyesterday at 12:49 PM

> why are we concluding that bigger models and more data = more hallucination?

That’s not what your quotes said. They said bigger models = plateau in intelligence, nothing about more data or increased hallucinations

The relevant quote for what you’re talking about would be:

> It’s been proven that when a model is trained on large volumes of highly factual and non-theoretical data, it learns to always have an answer.

So there’s two separate claims: 1) bigger models have plateauing results 2) models trained on larger amounts of factual data have a higher hallucination rate

I’m pretty sure #1 is well known, I think OpenAI’s own research on scaling laws showed diminishing returns on parameter count and training data volume years ago. I don’t know what the support for #2 is besides for the actual post contents.

show 5 replies
bilateryesterday at 6:02 PM

Yeah not only is it totally unsubstantiated, the benchmarks are getting less useful to really show the difference between these models. Big model smell is still a thing and GLM 5.2 while impressive is not Fable class.

Here is something I would like people to chew on. Perhaps the smartest researchers in the world across multiple labs know more about this than we do? Perhaps they are aware of issues like the data wall and diminishing marginal returns. And perhaps they are being honest when they tell you there is no wall?

show 1 reply
eurekinyesterday at 12:59 PM

> A shift is happening among major AI labs, who are becoming increasingly skeptical of endless parameter count and training data scaling

I'm pretty sure it's mostly due to the training data quality. No idea, why this never gets mentioned in those discussions.

It was obvious right from the get go, that the scaling law just enabled some abilities, that were described by the underlying data and allowing the ANN to abstract it in the latent space.

aurareturnyesterday at 3:00 PM

Aren't hallucinations also heavily influenced by compute and memory capacity? IE. Companies can spend more time to verify results in an agentic format, spend more thinking tokens, and less quantization. All of these heavily depend on compute and memory but are proven to decrease hallucinations.

Maybe GPT 5.5 is heavily nerfed due to lack of compute, memory, and energy?

I agree that it's farfetched to conclude that bigger models have pleateued.

show 1 reply
madduciyesterday at 12:36 PM

Isn't that the case of over fitting? You have more data, but when you ask something that's not in that data, hallucinations happen

utopiahyesterday at 4:32 PM

>> it is clear that actual intelligence has plateaued significantly.

> These are wild claims -

Indeed, it is not clear there was any actual intelligence at any point.

A lot of generated content sure, sometimes even useful, but not necessarily anything more.

show 1 reply
coldteayesterday at 12:53 PM

>These are wild claims - why are we concluding that bigger models and more data = more hallucination?

Because that's what they measured in this case.

restersyesterday at 3:23 PM

to train models to be smarter than they are, one needs examples and cases to train on, and once you get close to the top percentiles of human reasoning there is extremely little such material available.

You can create contrived logic problems, but they often turn into language games because English is not formal logic.

And you can train on "monty hall" style problems, but those too are language games that are intriguing to humans but obvious when framed slightly differently.

In other words, model trainers are fighting against the overwhelming mediocrity of the training corpus (all of the recorded human output from history).

As models improve, the next phase will be models co-designed with humans to overcome these limits. The way we use language and the process we use to problem solve (we currently call this "orchestration") will evolve as part of this. Meatspace metaphors map badly when we have massive context and don't need the same limits. How different is hallucination from extrapolation, etc.

Much of the skepticism and confusion about LLMs is no different than a person of average intelligence hearing a highly intelligent person explain something and considering the explanation gibberish, then arrogantly accusing the intelligent person of being unhelpful.

Much like dogs were domesticated from wolves to have traits that make them good around humans, LLMs will evolve around our limits, around our arrogance, around our aesthetic biases and prejudices. Intelligence and rationality is fundamentally not what most humans want from an LLM.

blurbleblurbleyesterday at 2:17 PM

How do we know gpt 5.5 is a bigger model

show 1 reply
claytongulickyesterday at 4:05 PM

My impression is that the fundamental issue is that LLMs attempt to extract reasoning (executive execution) from data (relationship between tokens).

There's an open question about whether this is theoretically possible, but it doesn't seem like it to me.

Human generated data is an effect of reasoning. Attempting to extract executive function from it is kind of like taking an anti-derivative of a function.

This has always seemed like the root of hallucinations to me. It sort of follows the parallels to lossy compression that a lot of people draw. You're extracting some characteristics by observing the relationship between tokens, and then trying to argue that those characteristics are equivalent to the thing that generated the original tokens.

Surely there's some sort of overlap there, but viewed that way, it seems obvious that more and more parameters and scaling won't solve the fundamental problem. There's only so much meaning you can extract from token relationships.

It's like trying to derive the shape of a flame from the smoke it produces.

The original intelligence that created those tokens was driven by a whole universe of inputs, from hormones to starlight to gravity, not to mention all of the strange things about consciousness and parapsychology that is so poorly understood.

The machines are definitely useful for a certain class of tasks - those that don't require much executive function, and the useful work mostly involves pattern matching.

The problem is, we seem to be mistaking effect for cause and imagining that these things have greater capabilities than they'll ever posess.

The investors that don't understand this are indeed going to learn a bitter lesson.

dominotwyesterday at 4:03 PM

you mixed two random quotes from the article to create a strawman.

ofcourse you knew what you were doing but disappointing that this was top comment.

harrallyesterday at 2:47 PM

In cognitive science, it appears your brain has two modes of thinking:

- A very parallel type of computation that is fast and generally accurate and integrates hundreds of variables. It’s sometimes labeled as intuition or system 1 thinking.

- A much slower, step by step, analytical type, commonly linked with your pre-frontal cortex (one of the newest parts of the brain). Sometimes called system 2 thinking.

Maybe the way the universe works is that all computation more or less is one of those two types. In which case, an LLM alone is only the first part, which is often right but its results also cannot ever be proven.

show 1 reply