This is a surprising good read of how LLM works in general.
A really clear explanation!
So if I were running a provider I would be caching popular prefixes for questions across all users. There must be so many questions that start 'what is' or 'who was' etc?
Also, can subsequences in the prompt be cached and reused? Or is it only prefixes? I mean, can you cache popular phrases that might appear in the middle of the prompt and reuse that somehow rather than needing to iterate through them token by token? E.g. must be lots of times that "and then tell me what" appears in the middle of a prompt?
When will Microsoft do this sort of thing?
It's a pain having to tell Copilot "Open in pages mode" each time it's launched, and then after processing a batch of files run into:
https://old.reddit.com/r/Copilot/comments/1po2cuf/daily_limi...
It was a real facepalm moment when I realised we were busting the cache on every request by including date time near the top of the main prompt.
Even just moving it to the bottom helped move a lot of our usage into cache.
Probably went from something like 30-50% cached tokens to 50-70%.
I gave the table of inputs and outputs to both Gemini 3.0 flash and GPT 5.2 instant and they were stumped.
https://t3.chat/share/j2tnfwwful https://t3.chat/share/k1xhgisrw1
What a fantastic article! How did you create the animations?
Took me a minute to see it is same Ngrok which provided freemium tunnels to localhost. How did they adapt to the AI revolution?
[under-the-rug stub]
[see https://news.ycombinator.com/item?id=45988611 for explanation]
Blog starts loading and then gives "Something Went Wrong. D is not a function" error displayed
Link seems to be broken: content briefly loads then is replaced with "Something Went Wrong" then "D is not a function". Stays broken with adblock disabled.
Does anyone know whether the cache is segregated by user/API key for the big providers?
Was looking at modifying outgoing requests via proxy and wondering whether that's harming caching. Common coding tools presumably have a shared prompt across all their installs so universal cache would save a lot