logoalt Hacker News

fulafeltoday at 5:51 AM2 repliesview on HN

You're right in terms of fitting your program to memory, so that it can run in the first place.

But in performance work, the relative speed of RAM relative to computation has dropped such that it's a common wisdom to treat today's cache as RAM of old (and today's RAM as disk of old, etc).

In software performance work it's been all about hitting the cache for a long time. LLMs aren't too amenable to caching though.


Replies

makapuftoday at 6:53 AM

AFAIK, you can't explicitly allocate cache like you allocate RAM however. A bit like if you could only work on files and ram was used for cache. Maybe I am mistaken ? (Edit: typo)

show 2 replies
seanmcdirmidtoday at 6:47 AM

LLMs need memory bandwidth to stream lots of data through quickly, not so much caching. Well, this is basically the same way that a GPU uses memory.

show 1 reply