logoalt Hacker News

ismailmajtoday at 1:20 PM1 replyview on HN

3 things, they can, there is a precedent for that with Google v. Oracle for Java, and they have something!

AMD engineered something called HIP which is CUDA API compatible libraries that targets AMD's hardware, it's the closest thing we have for drop-in replacement to Nvidia's software moat.

It works for simple stuff but loses terribly for frontier kernels (like Flash Attention 3), novel approaches (e.g. Mamba) or networking (e.g. NCCL), also they are rough on the edges, so what you gain from GPU costs is lost in engineering cost.

My previous company tried to compete in this GPU game while putting effort to have a good software stack (Rivos), drop in replacement and cheaper with decent software.

But that vision was rough, any new player had to implement the bad APIs due to backward compatibility concerns, following specs wasn't sufficient as a lot of the AI stack was depending on observable effects (Hyrum's Law), and Nvidia simply just had a long head start, the company is now dead (acquired by Meta) and AFAIK there isn't another player.

Best case scenario AMD puts more effort into their software stack but I just think they do not have enough internal talent to compete.

Training will continue to be an Nvidia's thing and that's where most of the money sits, unless suddenly the AI research scene pivots to using JAX but I do not see it coming any time soon, if anything, I've seen internal efforts at Google to make PyTorch work nicely with TPUs. Some players like Anthropic started using JAX for training but all the small players are using Nvidia, I'm guessing it has something to do with Nvidia partnering aggressively with startups.


Replies

HarHarVeryFunnytoday at 5:41 PM

I think AMD have essentially given up on the consumer / small scale GPU compute market, while being extremely successful selling their AI chips to much bigger customers. Some of the biggest supercomputers (clusters) in the world, such as the Lawrence Livermore and Oak Ridge exascale computers, are AMD Instinct based, but the tools and level of support they get is not going to be the same as someone at home trying to get ROCm running on their gaming card.

I wonder how big the market is for consumer/etc vs these massive installations?