logoalt Hacker News

knollimaryesterday at 2:37 PM3 repliesview on HN

Don't you need two 512GB ones for unquanted latest chinese models?


Replies

redman25yesterday at 2:45 PM

For consumers, there's little reason to run unquanted, especially for large models which take less of a hit from quantization. I'm running a 200b model at Q3 with very little degradation. A 1000b model would see even less change.

hu3yesterday at 5:50 PM

Yes and the result of this $10k endeavour is a much slower a dumber model than any SoTA $20/mo API. On top of the maintenance burden to keep software/models updated.

zitterbewegungyesterday at 2:58 PM

Getting 512 GB of ram at the price point is cheaper than everything else. That’s why Apple stopped production to divert for the M5 ultra.