logoalt Hacker News

CubeCL: GPU Kernels in Rust for CUDA, ROCm, and WGPU

209 pointsby ashvardanian04/23/202541 commentsview on HN

Comments

rfoo04/24/2025

I'd recommend having a "gemm with a twist" [0] example in the README.md instead of having an element-wise example. It's pretty hard to evaluate how helpful this is for AI otherwise.

[0] For example, gemm but the lhs is in fp8 e4m3 and rhs is in bf16 and we want fp32 accumulation, output to bf16 after applying GELU.

show 3 replies
kookamamie04/24/2025

This reminds me of Halide (https://halide-lang.org/).

In Halide, the concept was great, yet the problems in kernel development were moved to the side of "scheduling", i.e. determining tiling/vectorization/parallellization for the kernel runs.

the__alchemist04/24/2025

Love it. I've been using cudarc lately; would love to try this since it looks like it can share data structures between host and device (?). I infer that this is a higher-level abstraction.

zekrioca04/24/2025

Very interesting project! I am wondering how it compare against OpenCL, which I think adopts the same fundamental idea (write once, run everywhere)? Is it about CUbeCL's internal optimization for Rust that happens at compile time?

show 2 replies
gitroom04/24/2025

Gotta say, the constant dance between all these GPU frameworks kinda wears me out sometimes - always chasing that better build, you know?

show 1 reply
LegNeato04/24/2025

See also this overview for how it compares to other projects in the Rust and GPU ecosystem: https://rust-gpu.github.io/ecosystem/

show 1 reply
bionhoward04/24/2025

Praying to the kernel gods for some Rust FP8 training

adastra2204/24/2025

Where is the Metal love…

show 3 replies
DarkmSparks04/24/2025

wow, what's the downsides to this? It feels like it could be one of the biggest leaps in programming in a long time, does it keep rusts safety aspects? How does it compare with say openCL?

show 1 reply