The most surprising part: the agent had access to both H100s and H200s. Without being told, it notic...

zhwu • today at 6:00 PM • 4 replies • view on HN

The most surprising part: the agent had access to both H100s and H200s. Without being told, it noticed H200s scored better and started screening ideas on H100s, then promoting winners to H200s for validation. That strategy emerged entirely on its own.

Replies

rogerrogerr • today at 6:13 PM

Why do we think this emerged “on its own”? Surely this technique has been discussed in research papers that are in the training set.

➕ show 1 reply

hhh • today at 6:33 PM

Why?… The experiment.yaml shows that it is calling h100/200 explicitly, it’s pretty common for humans to say “number bigger more gooder” for anything… Lie and reverse the values and see what happens. I would put money on a rabbit hole of complaining about it being misconfigured.

➕ show 1 reply

Aboutplants • today at 6:04 PM

Yeah I thought that was a particularly neat part

TheJord • today at 7:47 PM

[dead]

alt Hacker News

Replies