logoalt Hacker News

reilly3000yesterday at 10:11 PM4 repliesview on HN

Which takes a $20k thunderbolt cluster of 2 512GB RAM Mac Studio Ultras to run at full quality…


Replies

0xbadcafebeetoday at 12:46 AM

Most benchmarks show very little improvement of "full quality" over a quantized lower-bit model. You can shrink the model to a fraction of its "full" size and get 92-95% same performance, with less VRAM use.

show 1 reply
bigyabaitoday at 12:23 AM

"Full quality" being a relative assessment, here. You're still deeply compute constrained, that machine would crawl at longer contexts.

PlatoIsADiseaseyesterday at 11:46 PM

[flagged]

show 3 replies
teaearlgraycoldyesterday at 10:13 PM

Which while expensive is dirt cheap compared to a comparable NVidia or AMD system.

show 2 replies