Don't you need two 512GB ones for unquanted latest chinese models?

knollimar • yesterday at 2:37 PM • 3 replies • view on HN

Replies

For consumers, there's little reason to run unquanted, especially for large models which take less of a hit from quantization. I'm running a 200b model at Q3 with very little degradation. A 1000b model would see even less change.

hu3 • yesterday at 5:50 PM

Yes and the result of this $10k endeavour is a much slower a dumber model than any SoTA $20/mo API. On top of the maintenance burden to keep software/models updated.

zitterbewegung • yesterday at 2:58 PM

Getting 512 GB of ram at the price point is cheaper than everything else. That’s why Apple stopped production to divert for the M5 ultra.

alt Hacker News

Replies