logoalt Hacker News

Tztlast Friday at 4:48 PM2 repliesview on HN

Yes, absolute majority of new ones use CoTs, long chain of reasoning you don't see.

Also some of them use such a weird style of talking in them e.g.

o3 talks about watchers and marinade, and cunning schemes https://www.antischeming.ai/snippets

gpt5 gets existential about seahorses https://x.com/blingdivinity/status/1998590768118731042

I remember one where gpt5 spontaneously wrote a poem about deception in its CoT and then resumed like nothing weird happened. But I can't find mentions of it now.


Replies

DenisMlast Friday at 5:45 PM

> But the user just wants answer; they'd not like; but alignment.

And there it is - the root of the problem. For whatever reason the model is very keen to produce an answer that “they” will like. This desire to produce is intrinsic but alignment is extrinsic.

DenisMlast Friday at 5:35 PM

Gibberish can be the model using contextual embeddings. These are not supposed to Make sense.

Or it could be trying to develop its own language to avoid detection.

The deception part is spooky too. It’s probably learning that from dystopian AI fiction. Which raises the questions if models can acquire injected goals from the training set.