Sonnet, GPT-5.2, Gemini Flash, in a set of 21 games, where conclusions are drawn from the LLMs self ...

sohex • yesterday at 9:31 PM • 3 replies • view on HN

Sonnet, GPT-5.2, Gemini Flash, in a set of 21 games, where conclusions are drawn from the LLMs self reported reasoning.

This is like writing a paper about kids in a literal sandbox fighting over ‘territory’.

The models employed don’t indicate the actual extents of machine reasoning even as we currently recognize them. They certainly don’t have the metacognition necessary to accurately understand their own reasoning. As we’ve seen with recent papers on how LLMs do math there’s a complete disconnect between actual and reported mechanism.

“Chilling” shouldn’t be the take away here.

Replies

motoxpro • yesterday at 11:07 PM

So in the conext you just laid out, you can apply that to this. "Artificial Intelligence Strategy for the Department of War" https://media.defense.gov/2026/Jan/12/2003855671/-1/-1/0/art...

regardless of what the capabilities of the models are, they will be used in every situation possible.

DaiPlusPlus • yesterday at 10:35 PM

> “Chilling” shouldn’t be the take away here.

It is when you consider the personality currently occupying the office of US SecDef.

shimman • yesterday at 10:40 PM

LLMs have already been used to bomb school girls, chilling is absolutely the operative word to use here. Especially since these delusional fools want to incorporate LLMs into everything.

➕ show 1 reply

alt Hacker News

Replies