logoalt Hacker News

arjietoday at 5:39 PM2 repliesview on HN

I have a little note from the past about the thinking trace[0] where DeepSeek R1 produces a trace like this:

    (Dimethyl(oxo)-lambda6-sulfa雰囲idine)methane donate a CH2rola group occurs in reaction, Practisingproduct transition vs adds this.to productmodule. Indeed"come tally said Frederick would have 10 +1 =11 carbons. So answer q Edina is11.
And then concludes the 'right'[1] answer for a Chemistry question. If so, the thinking trace can be sort of nonsensical for a reader, though whether this is an idiosyncrasy of the model or a property of LLMs in general isn't clear to me yet. I talked to the author a while ago, but forgot to follow up since his paper was going to come out at NIPS or something, so if someone else finds it maybe they can share.

0: https://wiki.roshangeorge.dev/w/Blog/2025-10-12/Word_Magic#I...?

1: In the sense of true belief, I suppose


Replies

ekiddtoday at 6:06 PM

> If so, the thinking trace can be sort of nonsensical for a reader, though whether this is an idiosyncrasy of the model or a property of LLMs in general isn't clear to me yet.

Yes, several models think in weird jargon. Here is an example of Mythos's thinking while playing solitaire: https://www.lesswrong.com/posts/wCSEpT3dTGz4N86Wi/even-illeg...

> 7♣-removal-IS-the-prerequisite-for-10♠/9♥!!)-⟹-OVERLAP-(ii)+(iv):-{6♠ J♦ 9♥ 2♣}-=-FOUR--—-UNLESS-7♣'s-seat-8♥-...-and-2♣-drains-only-at-crack-:-⟹-2♣-celled-+-9♥-celled-simultaneously-UNAVOIDABLE-in-t8-dig--—-BREAK:-9♥

This is a small step in the direction of something called "neuralese", where the model has stopped thinking in English and is thinking in internal vector spaces. Since this gets serialized through text, it isn't quite true neuralese, but it's moving in that direction.

I mean, I'm sympathetic towards the models. My internal thought process when writing code uses lots of intermediate steps that would be hard to write out in English.

drdaemantoday at 7:06 PM

Isn't that just a token noise from a broken implementation or model quantization? I've had models spewing out nonsense like that, every time it was either that there was a bug in llama.cpp or some messed up .gguf.