logoalt Hacker News

anon84873628yesterday at 11:00 PM0 repliesview on HN

Right. Everyone is using this to judge the LLMs instead of questioning what situation they were actually fed and whether it was in fact the best move.

More likely, the simulation was just very poor and the results are nonsense.