But doesnt existing AI systems already learn in some way ? Like the training steps are actually the AI learning already. If you have your training material being setup by something like claude code, then it kind of is already autonomous learning.
Has anyone tried implementing something like System M's meta-control switching in practice? Curious how you'd handle the reward signal for deciding when to switch between observation and active exploration without it collapsing into one mode.
claude is learning very fast
There's already a model capable of autonomous learning on the small scale, just nobody's tried to scale it up yet: https://arxiv.org/abs/2202.05780
I remember a joke from few years ago that was showing an "AI" that was "learning" on its "own" which meant periodically starting from scratch with a new training set curated by a large team of researchers themselves relying on huge teams (far away) of annotators.
TL;DR: depends where you defined the boundaries of your "system".
by Emmanuel Dupoux, Yann LeCun, Jitendra Malik
"he proposed framework integrates learning from observation (System A) and learning from active behavior (System B) while flexibly switching between these learning modes as a function of internally generated meta-control signals (System M). We discuss how this could be built by taking inspiration on how organisms adapt to real-world, dynamic environments across evolutionary and developmental timescales. "
The paper's critique of the 'data wall' and language-centrism is spot on. We’ve been treating AI training like an assembly line where the machine is passive, and then we wonder why it fails in non-stationary environments. It’s the ultimate 'padded room' architecture: the model is isolated from reality and relies on human-curated data to even function.
The proposed System M (Meta-control) is a nice theoretical fix, but the implementation is where the wheels usually come off. Integrating observation (A) and action (B) sounds great until the agent starts hallucinating its own feedback loops. Unless we can move away from this 'outsourced learning' where humans have to fix every domain mismatch, we're just building increasingly expensive parrots. I’m skeptical if 'bilevel optimization' is enough to bridge that gap or if we’re just adding another layer of complexity to a fundamentally limited transformer architecture.
"don't learn" might be a good feature from a business point of view
Imagine if AI learns all your source code and apply them to your competitor /facepalm
Can I run it?
[dead]
Not learning from new input may be a feature. Back in 2016 Microsoft launched one that did, and after one day of talking on Twitter it sounded like 4chan.[1] If all input is believed equally, there's a problem.
Today's locked-down pre-trained models at least have some consistency.
[1] https://www.bbc.com/news/technology-35890188