It could run viably with SSD offload on Macs with very little memory. You could even exploit batchi...

zozbot234 • yesterday at 11:26 PM • 1 reply • view on HN

It could run viably with SSD offload on Macs with very little memory. You could even exploit batching to make the model almost compute limited even in that challenging setting, seeing as the KV cache is so extremely small (for non-humongous context). In fact, if that approach can be made to work I'd like to see a comparison between DS4 Flash and Pro on the same (Mac) hardware.

Replies

Havoc • yesterday at 11:54 PM

>It could run viably with SSD offload on Macs with very little memory

Not really. That's going to land you somewhere in the 0.2-0.5 tokens a second range

Lovely as modern nvmes are they're not memory

➕ show 1 reply

alt Hacker News

Replies