I can't pretend to understand how LLMs work, but I can be sure that anthropomorphizing their functions is not helpful to an objective debate over their abilities.
Does a motor vehicle get "sleep" when it is serviced? When I reboot a computer, is that equivalent to a nap?
This is why I object to sleep() from unistd.h. What an anthropomorphizing notion. Didn't early unix programmers understand that a computer isn't a living creature and therefore isn't capable of sleep? They must have been really stupid!
Anthropomorphization is not inherently wrong, and in some instances, it actually lets you reason better about about complex behavior than whatever convoluted (and often wrong, especially in the case of giant neural networks) mechanistic description one might conjure.
Here the analogy isn't without reason.
Just like LLM sleep has nothing to do with animal sleep, the neuron in a neural network has nothing to do with an actual neuron, and nobody should pretend they do.
I agree we need to be mindful of our metaphores, but they do help both with inspiration for developing techniques as well as for naming things. The onus of keeping bias in check when using metaphores is on the reader, authors can't really do that for you. However once bias is in check you can have a very productive debate in terms of these namings given that everyone is aware of their ontology.
Saying something needs sleep isn't anthropomorphizing, since pretty much all complex living organisms need sleep.
Also, even when something is "specific" to humans, it might not be anthropomorphizing to observe it in something else, it could just be an emergent pattern of high intelligence.
How do you concisely describe a low power state of an entity that processes, whereby when in that state it has little to no reaction to input and it may or may not be performing tasks in that state, for a mixed education audience?
Also keep in mind that most if not all devices with a chip have had a function called "sleep" for many years, without this argument.
One of the most common functions in programming is sleep(ms). There is wake, heartbeat, handshake, orphan, listen, starve, parent/child, etc.
This is not anything new, its just a word that fits the function.
This is the struggle of naming papers. You could stretch definitions and make your own sexy headline or you could be precise and fewer people will read it.
I think it's interesting that folks are suddenly taking issue with "anthropomorphizing" language used in AI as if we haven't been doing this since the earliest days of computing (see "memory", "child", "parent", etc). It helps folks understand things at the correct level without needing domain knowledge
> Does a motor vehicle get "sleep" when it is serviced?
That's more like a doctor visit and a workout. The sleep will be the part of the duty cycle when it's not operating.
> When I reboot a computer, is that equivalent to a nap?
Yes, it wakes up completely refreshed and in good working order, usually, and if there's still a problem you know you need a technician.
If it works, it's called bionics, not anthropomorphization ;)
I assume compacting is the sleep here; so, yes
I find this annoying too. "Sleep" is okay, but the quippy headlines ("need sleep"—short, snappy and vague) infiltrating journals bother me. I've seen it well before LLMs, but as an example, there is a long list of title snowclones of the famous attention paper: https://github.com/vinayprabhu/X-is-all-you-need.
Does a motor vehicle get "sleep" when it is serviced?
One of the mayors of New York in the 80's (Koch?) famously doubled the city's bus fleet for zero cost by running them 24 hours, instead of letting them rest at the end of their shifts, as was the previous policy.
> When I reboot a computer, is that equivalent to a nap?
I mean, you do put your computer into "sleep" mode and then "wake" it.
Analogies are useful. I think we need to learn how to continue to benefit from them despite the risk of anthropomorphication.
Very much agree that while it is is useful in description of motivation and inspiration,
it is very non-helpful—or worse—to use this language, this way.
One might as well say "need neural plasticity" which is as much an analogy and equally misleading and counterproductive in shaping the right model of the system.
One might even call this pernicious, what it encourages is already a social problem; and it doesn't aid understanding, it confounds it.
The analogy is helpful, but yes we should be able to “intelligently design” something better than sleep analogues since we’re not constrained by evolution like in humans.
See also, perhaps: https://news.ycombinator.com/item?id=48273597
Just from the title, I’m assuming it refers to a period of downtime used to perform some sort of maintenance on the knowledge held by the system.
Clicking through, that’s exactly what it is. Seems like “sleep” is an excellent term to use here.
>we study a sleep-like consolidation mechanism in which a model periodically converts recent context into persistent fast weights before clearing its key-value cache
There is a strong, non-trivial connection here between what your brain does in sleep and what they are studying.
You wouldn't object to referring to robot eyes or robot legs.
... and anyway, maybe it was hungry? Or getting the sniffles?
They provide an explanation for using the term "sleep":
> In animals, the transfer from short-term memory to long-term memory is thought to be supported by hippocampal replay [33], especially during sleep [41]; in this phase, short-term hippocampal memories are reactivated and consolidated into cortical synaptic weights. Sleep makes animals unable to respond to external stimuli, suggesting that it must provide enough cognitive benefit to justify this cost [41]. Inspired by these biological processes, we propose a method for transferring context-window memory into persistent weights. When the model’s context window becomes full during inference, the model enters a “sleep” in which it performs multiple forward passes over the accumulated context and recursively updates its fast weights via a learned local rule. As in animal sleep, the model receives no external input tokens during this phase. After consolidation, the context window is cleared, and the model resumes operation with updated fast weights. During training, the model is optimized end-to-end by backpropagating through the entire process to maximize task performance after sleep.