logoalt Hacker News

eqmviiyesterday at 4:39 PM10 repliesview on HN

Could this be an experiment to show how likely LLMs are to lead to AGI, or at least intelligence well beyond our current level?

If you could only give it texts and info and concepts up to Year X, well before Discovery Y, could we then see if it could prompt its way to that discovery?


Replies

ben_wyesterday at 4:49 PM

> Could this be an experiment to show how likely LLMs are to lead to AGI, or at least intelligence well beyond our current level?

You'd have to be specific what you mean by AGI: all three letters mean a different thing to different people, and sometimes use the whole means something not present in the letters.

> If you could only give it texts and info and concepts up to Year X, well before Discovery Y, could we then see if it could prompt its way to that discovery?

To a limited degree.

Some developments can come from combining existing ideas and seeing what they imply.

Other things, like everything to do with relativity and quantum mechanics, would have required experiments. I don't think any of the relevant experiments had been done prior to this cut-off date, but I'm not absolutely sure of that.

You might be able to get such an LLM to develop all the maths and geometry for general relativity, and yet find the AI still tells you that the perihelion shift of Mercury is a sign of the planet Vulcan rather than of a curved spacetime: https://en.wikipedia.org/wiki/Vulcan_(hypothetical_planet)

show 3 replies
water-data-dudeyesterday at 5:31 PM

It'd be difficult to prove that you hadn't leaked information to the model. The big gotcha of LLMs is that you train them on BIG corpuses of data, which means it's hard to say "X isn't in this corpus", or "this corpus only contains Y". You could TRY to assemble a set of training data that only contains text from before a certain date, but it'd be tricky as heck to be SURE about it.

Ways data might leak to the model that come to mind: misfiled/mislabled documents, footnotes, annotations, document metadata.

show 1 reply
alansaberyesterday at 4:43 PM

I think not if only for the fact that the quantity of old data isn't enough to train anywhere near a SoTA model, until we change some fundamentals of LLM architecture

show 2 replies
armcatyesterday at 4:56 PM

I think this would be an awesome experiment. However you would effectively need to train something of a GPT-5.2 equivalent. So you need lot of text, a much larger parameterization (compared to nanoGPT and Phi-1.5), and the 1800s equivalents of supervised finetuning and reinforcement learning with human feedback.

dexwizyesterday at 5:06 PM

This would be a true test of can LLMs innovate or just regurgitate. I think part of people's amazement of LLMs is they don't realize how much they don't know. So thinking and recalling look the same to the end user.

Trufayesterday at 4:57 PM

This is fascinating, but the experiment seems to fail in being a fair comparison of how much knowledge can we have from that time in data vs now.

As a thought experiment I find it thrilling.

Rebuff5007yesterday at 5:01 PM

OF COURSE!

The fact that tech leaders espouse the brilliance of LLMs and don't use this specific test method is infuriating to me. It is deeply unfortunate that there is little transparency or standardization of the datasets available for training/fine tuning.

Having this be advertised will make more interesting and informative benchmarks. OEM models that are always "breaking" the benchmarks are doing so with improved datasets as well as improved methods. Without holding the datasets fixed, progress on benchmarks are very suspect IMO.

nickpsecurityyesterday at 8:54 PM

That is one of the reasons I want it done. We cant tell if AI's are parroting training data without having the whole, training data. Making it old means specific things won't be in it (or will be). We can do more meaningful experiments.

mistermannyesterday at 7:38 PM

[dead]

feisty0630yesterday at 4:52 PM

I fail to see how the two concepts equate.

LLMs have neither intelligence nor problem-solving abillity (and I won't be relaxing the definition of either so that some AI bro can pretend a glorified chatbot is sentient)

You would, at best, be demonstrating that the sharing of knowledge across multiple disciplines and nations (which is a relatively new concept - at least at the scale of something like the internet) leads to novel ideas.

show 1 reply