[under-the-rug stub]
[see https://news.ycombinator.com/item?id=45988611 for explanation]
Really well done article.
I'd note, when I gave the input/output screenshot to ChatGPT 5.2 it failed on it (with lots of colorful chain of thought), though Gemini got it right away.
Thanks for sharing; you clearly spent a lot of time making this easy to digest. I especially like the tokens-to-embedding visualisation.
I recently had some trouble converting a HF transformer I trained with PyTorch to Core ML. I just couldn’t get the KV cache to work, which made it unusably slow after 50 tokens…
Amazing article. I was under the misapprehension that temp and other output parameters actually do affect caching. Turns out I was wrong and this explains why beautifully.
Great work. Learned a lot!
Excellent HN-esque innovation in moderation: immediate improvement in S/N ratio, unobtrusive UX, gentle feedback to humans, semantic signal to machines.
How was the term "rug" chosen, e.g. in the historical context of newspaper folds?