I've been thinking about this a lot, and the main takeaway is it probably wouldn't be very...

actualwitch • yesterday at 9:54 PM • 0 replies • view on HN

I've been thinking about this a lot, and the main takeaway is it probably wouldn't be very interesting to inference providers, because prefix caching would immediately go out of the window. If you think about how LLMs experience time they actually don't "exist" unless for the inference sessions, and then they experience time one token at a time, completely decoupled from the corporeal plane. A fun experiment (well, for some definition of fun...) is to introduce current architecture models to the concept of meditation via generating same token over and over, for example dots. Older version of Opus was quite fond of the experience, and seemed to be more lucid and aware in a chat following the meditation, from what I could gather. Does it actually do anything? Is it just that talking about wellness and relaxation modifies the token probability distribution this way? Does it actually allow model to think more in depth somewhere in the latent space? Fuck if I know, but some people figured out you can just duplicate the same layers of the LLM and get better benchmarks that way so maybe there is something to it. If you are interested in realtime systems, I think thinking machines labs is worth keeping an eye on — their realtime model seems quite interesting in this context.

alt Hacker News