> I don't know what he was talking about There's a bunch of ways AI is improvi...

qcnguy • today at 10:23 AM • 4 replies • view on HN

> I don't know what he was talking about

There's a bunch of ways AI is improving itself, depending on how you want to interpret that. But it's been true since the start.

1. AI is used to train AI. RLHF uses this, curriculum learning is full of it, video model training pipelines are overflowing with it. AI gets used in pipelines to clean and upgrade training data a lot.

2. There are experimental AI agents that can patch their own code and explore a tree of possibilities to boost their own performance. However, at the moment they tap out after getting about as good as open source agents, but before they're as good as proprietary agents. There isn't exponential growth. There might be if you throw enough compute at it, but this tactic is very compute hungry. At current prices it's cheaper to pay an AI expert to implement your agent than use this.

Replies

Eggpants • today at 4:50 PM

So have an AI with a 40% error rate judge an AI with an 40% error rate…

AGI is a complete no go until a model can adjust its own weights on the fly, which requires some kind of negative feedback loop, which requires a means to determine a failure.

Humans have pain receptors to provide negative feedback and we can imagine events that would be painful such as driving into a parked car would be painful without having to experience it.

If current models could adjust its own weights to fix the famous “how many r’s in strawberry” then I would say we are on the right path.

However, the current solution is to detect the question and forward it to a function to determine the right answer. Or attempt to add more training data the next time the model is generated ($$$). Aka cheat the test.

mitjam • today at 3:20 PM

I think LLM as a toolsmith like demonstrated in the Voyager paper (1) is another interesting approach to creating a system that can learn to do a task better over time. (1) https://arxiv.org/abs/2305.16291

franktankbank • today at 12:54 PM

I'm skeptical that RLHF really works. Doesn't it just patch the obvious holes so it looks better on paper? If it can't reason then it will continue to get 2nd and 3rd order difficulty problems wrong.

Yoric • today at 10:42 AM

> There are experimental AI agents that can patch their own code and explore a tree of possibilities to boost their own performance. However, at the moment they tap out after getting about as good as open source agents, but before they're as good as proprietary agents.

Interesting. Do you have links?

➕ show 1 reply

alt Hacker News

Replies