> but in a 'smart' way so you don't overload the NVMe unnecessarily "overlo...

embedding-shape • today at 4:59 PM • 3 replies • view on HN

> but in a 'smart' way so you don't overload the NVMe unnecessarily

"overloading NVMe"? What is that about? First time I've heard anything about it.

> because putting a ton of stress on your NVMe during generation

Really shouldn't "stress your NVMe", something is severely wrong if that's happening. I've been hammering my SSDs forever, and while write operations "hurt" the longevity of the flash cells themselves, the controller interface really shouldn't be affected by this at all, unless I'm missing something here.

Replies

tatef • today at 6:30 PM

Hypura reads tensor weights from the GGUF file on NVMe into RAM/GPU memory pools, then compute happens entirely in RAM/GPU.

There is no writing to SSDs on inference with this architecture.

➕ show 1 reply

hrmtst93837 • today at 7:32 PM

People talk about "SSD endurance", but enough parallel I/O on M1/M2 can make the NVMe controller choke, with very weird latncy spikes.

Insanity • today at 5:05 PM

I had assumed heat generation on the controller if it's continuously reading. But maybe it's not actually bad.

➕ show 1 reply

alt Hacker News

Replies