Right. Everyone is using this to judge the LLMs instead of questioning what situation they were actu...

anon84873628 • yesterday at 11:00 PM • 0 replies • view on HN

Right. Everyone is using this to judge the LLMs instead of questioning what situation they were actually fed and whether it was in fact the best move.

More likely, the simulation was just very poor and the results are nonsense.

alt Hacker News