> Curious how you'd handle the reward signal for deciding when to switch between observation...

robot-wrangler • today at 2:47 AM • 2 replies • view on HN

> Curious how you'd handle the reward signal for deciding when to switch between observation and active exploration without it collapsing into one mode.

If you like biomimetic approaches to computer science, there's evidence that we want something besides neural networks. Whether we call such secondary systems emotions, hormones, or whatnot doesn't really matter much if the dynamics are useful. It seems at least possible that studying alignment-related topics is going to get us closer than any perspective that's purely focused on learning. Coincidentally quanta is on some related topics today: https://www.quantamagazine.org/once-thought-to-support-neuro...

Replies

fallous • today at 3:57 AM

The question is does this eventually lead us back to genetic programming and can we adequately avoid the problems of over-fitting to specific hardware that tended to crop up in the past?

t-writescode • today at 3:38 AM

Or possibly “in addition to”, yeah. I think this is where it needs to go. We can’t keep training HUGE neural networks every 3 months and throw out all the work we did and the billions of dollars in gear and training just to use another model a few months.

That loops is unsustainable. Active learning needs to be discovered / created.

alt Hacker News

Replies