> A lot of that success is from reinforcement learning techniques where the LLM is made to solve ...

eru • last Wednesday at 3:43 AM • 0 replies • view on HN

> A lot of that success is from reinforcement learning techniques where the LLM is made to solve tons of math problems after the pre-training “read everything” step, which then gives it a chance to update its weights. LLMs aren’t just trained from reading a lot of text anymore. It’s very similar to how the alpha zero chess engine was trained, in fact.

It's closer to AlphaGo, which first trained on expert human games and then 'fine tuned' with self-play.

AlphaZero specifically did not use human training data at all.

I am waiting for an AlphaZero style general AI. ('General' not in the GAI sense but in the ChatGPT sense of something you can throw general problems at and it will give it a good go, but not necessarily at human level, yet.) I just don't want to call it an LLM, because it wouldn't necessarily be trained on language.

What I have in mind is something that first solves lots and lots of problems, eg logic problems, formally posed programming problems, computer games, predicting of next frames in a web cam video, economic time series, whatever, as a sort-of pre-training step and then later perhaps you feed it a relatively small amount of human readable text and speech so you can talk to it.

Just to be clear: this is not meant as a suggestion for how to successfully train an AI. I'm just curious whether it would work at all and how well / how badly.

Presumably there's a reason why all SOTA models go 'predict human produced text first, then learn problem solving afterwards'.

> I think the biggest limitation LLMs actually have, the one that is the biggest barrier to AGI, is that they can’t learn on the job, during inference. This means that with a novel codebase they are never able to build a good model of it, because they can never update their weights. [...]

Yes, I agree. But 'on-the-job' training is also such an obvious idea that plenty of people are working on making it work.

alt Hacker News