logoalt Hacker News

storusyesterday at 5:55 PM1 replyview on HN

I have MacStudio with 512GB RAM, 2x DGX Spark and RTX 6000 Pro WS (planing to buy a few of those in Max-Q version next). I am wondering if we ever see local inference so "cheap" as we see it right now given RAM/SSD price trends.


Replies

clusterhacksyesterday at 6:49 PM

Good grief. I'm here cautiously telling my workplace to buy a couple of dgx sparks for dev/prototyping and you have better hardware in hand than my entire org.

What kind of experiments are you doing? Did you try out exo with a dgx doing prefill and the mac doing decode?

I'm also totally interested in hearing what you have learned working with all this gear. Did you buy all this stuff out of pocket to work with?

show 1 reply