You're right in terms of fitting your program to memory, so that it can run in the first place....

fulafel • today at 5:51 AM • 2 replies • view on HN

You're right in terms of fitting your program to memory, so that it can run in the first place.

But in performance work, the relative speed of RAM relative to computation has dropped such that it's a common wisdom to treat today's cache as RAM of old (and today's RAM as disk of old, etc).

In software performance work it's been all about hitting the cache for a long time. LLMs aren't too amenable to caching though.

Replies

makapuf • today at 6:53 AM

AFAIK, you can't explicitly allocate cache like you allocate RAM however. A bit like if you could only work on files and ram was used for cache. Maybe I am mistaken ? (Edit: typo)

➕ show 2 replies

seanmcdirmid • today at 6:47 AM

LLMs need memory bandwidth to stream lots of data through quickly, not so much caching. Well, this is basically the same way that a GPU uses memory.

➕ show 1 reply

alt Hacker News

Replies