Popping the GPU Bubble

171 points • by radq • today at 5:14 AM • 40 comments • view on HN

Comments

I really appreciate this type of articles. I feel like a lot of knowledge in LLM training and inference is locked inside the heads of practitioners. Similar to compiler engineers before.

To work in LLM training/inference you’re expected to know this stuff but to know this stuff you need to be working in the space.

➕ show 4 replies

augment_me • today at 8:26 AM

As someone who works in the field, the blog is nice but it has a lot of CODEX fingerprints on it, and it's also very specific to the size of the model in question in a way that is not explicit from the blog until the very last section.

In general, for some reason CODEX loves CUDA-streams, it's the first optimization it goes for every time when writing GPU kernels. However in many cases this is not a bottleneck, it happens to be so here because the model in the blog is small (2.4ms FW-pass is tiny, and 9B params sit on a single GPU). Large models are closer to 30-40ms. The CPU-GPU sync is 1-2ms, when working on larger MoE models the scheduling of tokens in this way is much less important than for example scheduling of computation/communication or kernel optimization.

I wish the blog would state this at the start with the premise of what has been done, or show that this is indeed the bottleneck with some benchmarking. Otherwise is kind of overselling things imo.

➕ show 1 reply

gardnr • today at 5:51 AM

Different bubble than the one I was hoping for.

This appears to be different than the recent "Speculative Pipeline Decoding" paper: https://arxiv.org/abs/2605.30852

nl • today at 5:50 AM

> you find that the GPU often sits idle, not for lack of work, but because the CPU hasn't told it what to do next yet. This phenomenon is called a GPU bubble.

This is true, but I've never heard anyone refer to this as a GPU bubble before.

I think most people hear "GPU bubble" and think of a financial bubble of some kind.

➕ show 8 replies

amelius • today at 10:32 AM

The real GPU bubble will be when AI companies figure out they can better make their own ASICs and ditch all their GPUs onto the market.

➕ show 1 reply

NooneAtAll3 • today at 11:04 AM

I thought this was going to be an announcement of another GPU manufacturer :(

tjoekbezoer • today at 8:43 AM

Regarding the critique on the title: perhaps an analogy can be made to propeller cavitation on ships. Water influx rate, propeller design and operational parameters all influence the detrimental effect of water bubbles forming — deteriorating the system's efficiency.

The GPU would be the propeller, the influx is the work, and the operational parameters is what this article's about.

➕ show 1 reply

Schlagbohrer • today at 7:10 AM

I love the brand name, Moondream

fragmede • today at 8:20 AM

That's a terrible name for that and I can't say that Hanlon's razor applies. Bubble that everyone's knowingly referring to is the stock market collapsing like in 2001. To choose a headline that can be mistaken for that just to get clicks is shit. You could've called it GPU-CPU pipeline stall, but no, you intentionally chose a name that would be confused for something else just to get clicks?

➕ show 2 replies

alt Hacker News

Popping the GPU Bubble

Comments