logoalt Hacker News

firstbabyloniantoday at 3:01 PM3 repliesview on HN

> SSD streaming to GPU

Is this solution based on what Apple describes in their 2023 paper 'LLM in a flash' [1]?

1: https://arxiv.org/abs/2312.11514


Replies

simonwtoday at 3:10 PM

Yes. I collected some details here: https://simonwillison.net/2026/Mar/18/llm-in-a-flash/

show 3 replies
zozbot234today at 3:33 PM

A similar approach was recently featured here: https://news.ycombinator.com/item?id=47476422 Though iPhone Pro has very limited RAM (12GB total) which you still need for the active part of the model. (Unless you want to use Intel Optane wearout-resistant storage, but that was power hungry and thus unsuitable to a mobile device.)

show 2 replies
foobiekrtoday at 4:11 PM

This is not entirely dissimilar to what Cerebus does with their weights streaming.

show 1 reply