Wait, this isn't real, is it? Is there actually an intermediate model that translates DeepSeek's thinking from its "alien language" into human languages? That's not actually the case, right?
I thought "thinking" is literally the model generating additional text in a human language that shows its "thought process". It's added to the model's context, which helps it reason better because it now has this self-generated context.
The "their own language" idea seems to come from some recent science fiction where LLMs develop their alien language and take over the world by 2037 or something.
Current models simply generate additional text that gets added to the context for the trace. However iterative models that "think" by repeatedly looping through several layers instead of outputting text have recently been demonstrated.
You're right, it's just additional text that allows it to do thinking / reasoning-like behavior. The big proprietary models hide the real output from the user and instead provide a friendly abridged version, but that's just to protect their secret sauce from distillation.
The parent is off, you’re right. They may reason in any language, typically whatever the user’s language is, and you’ll see the reasoning directly with an open model like Deepseek.
Research only showed that thinking might be disconnected from the final output but in my experience they are very strongly correlated in recent models
Yeah, it's actually the case. Researchers have shown that the models response doesn't always follow from the reasoning. Whether you consider that an internal language or not really depends on what you're speculating the neural network is doing. I think there was an Antropic paper on it.