logoalt Hacker News

zhangchentoday at 1:38 AM2 repliesview on HN

Has anyone tried implementing something like System M's meta-control switching in practice? Curious how you'd handle the reward signal for deciding when to switch between observation and active exploration without it collapsing into one mode.


Replies

robot-wranglertoday at 2:47 AM

> Curious how you'd handle the reward signal for deciding when to switch between observation and active exploration without it collapsing into one mode.

If you like biomimetic approaches to computer science, there's evidence that we want something besides neural networks. Whether we call such secondary systems emotions, hormones, or whatnot doesn't really matter much if the dynamics are useful. It seems at least possible that studying alignment-related topics is going to get us closer than any perspective that's purely focused on learning. Coincidentally quanta is on some related topics today: https://www.quantamagazine.org/once-thought-to-support-neuro...

show 2 replies
claud_iatoday at 10:02 AM

[dead]