> the chains start in English but slowly descend into Neuralese
What is Nueralese? I tried searching for a definition but it just turns up a bunch of Less Wrong and Medium articles that don't explain anything.
Is it a technical term?
There's 2 things called neuralese:
1) internally, in latent space, LLMs use what is effectively a language, but all the words are written on top of each other instead of separately, and if you decode it as letters, it sounds like gibberish, even though it isn't. It's just a much denser language than any human language. This makes them unreadable ... and thus "hides the intentions of the LLM", if you want to make it sound dramatic and evil. But yeah, we don't know what the intermediate thoughts of an LLM sound like.
The decoded version is often referred to as "neuralese".
2) if 2 LLMs with sufficiently similar latent space communicate with each other (same model), it has often been observed that they switch to "gibberish" BUT when tested they are clearly still passing meaningful information to one another. One assumes they are using tokens more efficiently to get the latent space information to a specific point, rather than bothering with words (think of it like this: the thoughts of an LLM are a 3d point (in reality 2000d, but ...). Every token/letter is a 3d vector (meaning you add them), chosen so words add up to the thought that is their meaning. But when outputting text why bother with words? You can reach any thought/meaning by combining vectors, just find the letter moving the most in the right direction. Much faster)
Btw: some specific humans (usually toddlers or children that are related) when talking to each other switch to talking gibberish to each other as well while communicating. This is especially often observed in children that initially learn language together. Might be the same thing.
These languages are called "neuralese".
I suppose it means LLM gibberish
EDIT: orbital decay explained it pretty well in this thread
The author might use it as an analogy to mentalese but for neural networks.
https://en.wiktionary.org/wiki/mentalese
EDIT: After reading the original thread in more detail, I think some of the sibling comments are more accurate. In this case, neuralese is more like language of communication expressed by neural networks, rather than its internal representation.
neuralese is a term first used in neuroscience to describe the internal coding or communication system within neural systems.
it originally referred to the idea that neural signals might form an intrinsic "language" representing aspects of the world, though these signals gain meaning only through interpretation in context.
in artificial intelligence, the term now has a more concrete role, referring to the deep communication protocols used by multiagent systems.
It's a term somewhat popularized by the LessWrong/rationalism community to refer to communication (self-communication/note-taking/state-tracking/reasoning, or model-to-model communication) via abstract latent space information rather than written human language. Vectors instead of words.
One implication leading to its popularity by LessWrong is the worry that malicious AI agents might hide bad intent and actions by communicating in a dense, indecipherable way while presenting only normal intent and actions in their natural language output.