logoalt Hacker News

jms5501/21/20254 repliesview on HN

PyTorch and Jax, good to know.

Why do they have ROCm/CUDA backends in the first place though? Why not just Vulkan?


Replies

currymj01/21/2025

it's an interesting question. the unhelpful answer is Vulkan didn't exist when Tensorflow, PyTorch (and Torch, its Lua-based predecessor) were taking off and building GPU support. Apparently PyTorch did at one point prototype a Vulkan backend but abandoned it.

My own experience is that half-assed knowledge of C/C++, and a basic idea of how GPUs are architected, is enough to write a decent custom CUDA kernel. It's not that hard to do. No idea how I would get started with Vulkan, but I assume it would require a lot more ceremony, and that writing compute shaders is less intuitive.

there is also definitely a "worse is better" effect in this area. there are some big projects that tried to be super general and cover all use cases and hardware. but a time-crunched PhD student or IC just needs something they can use now. (even Tensorflow, which was relatively popular compared to some other projects, fell victim to this.)

George Hotz seems like a weird guy in some respects, but he's 100% right that in ML it is hard enough to get anything working at all under perfect conditions, you don't need fighting with libraries and build tools on top of that, or the mental overhead of learning how to use this beautiful general API that supports 47 platforms you don't care about.

except also "worse is better is better" -- e.g. because they were willing to make breaking changes and sacrifice some generality, Jax was able to build something really cool and innovative.

omcnoe01/21/2025

CUDA has first mover advantage, and provides a simpler higher level compute API for library maintainers compared to Vulkan.

pjmlp01/21/2025

Vulkan doesn't do C++, rather GLSL and HLSL, nor has good tooling for the few prototypes that target SPIR-V.