logoalt Hacker News

sigbottleyesterday at 11:27 PM4 repliesview on HN

Oh wow there's still work being done on ampere?

I was wondering - I've been thinking about switching to AI systems programming (I know, easy task), but from what I understand, industry cloud GPUs are the main winners, right? Nobody's going to pay me (assuming I even had the skills) to optimize for consumer GPUs?

From what I understand, it's not just number + capacity + performance, it's literal core primitives. I don't think any of the "Blackwell" chips like the grace one or rtx 5090 have for example SM pairs in their ISA? And likewise similar fundamental differences between consumer and cloud hopper (where the majority of the perf is the cloud one's ISA?)

So I guess I'm wondering if I should buy a GPU myself or should I just rent on the cloud if I wanted to start getting some experience in this field. How do you even get experience in this normally anyways, do you get into really good schools and into their AI labs which have a lot of funding?


Replies

coolsunglassestoday at 12:44 AM

I do CUDA for a living (not inference) and for the life of me (and a couple of LLMs for that matter) I cannot figure out what you mean by "SM pairs".

Do you mean the coupled dies on stuff like the B200? An NVidia chip die has many SMs if so.

Do you mean TMEM MMA cooperative execution? I'm guessing that must be it given what the paper is about.

show 1 reply
vlovich123today at 12:16 AM

Look at am the email addresses. If you’ll recall there’s an embargo on China.

storustoday at 12:06 AM

I still have 2x NVLinked A6000 and they aren't that bad compared to a single RTX 6000 Pro.

Maxiousyesterday at 11:49 PM

yep, https://github.com/poad42/cuda-fp8-ampere recently another attempt at squeezing whatever's left from ampere