LLM inference is mostly read only, so high-bandwidth flash looks like it could provide huge cost sav...

mrob • today at 8:50 AM • 1 reply • view on HN

LLM inference is mostly read only, so high-bandwidth flash looks like it could provide huge cost savings over VRAM. It's not yet in commercial products but there are working prototypes already. Previous HN discussion:

https://news.ycombinator.com/item?id=46700384

Replies

whosegotit • today at 10:40 AM

[dead]

alt Hacker News

Replies