logoalt Hacker News

zozbot234yesterday at 4:01 PM2 repliesview on HN

Isn't this just saying that your GPU use is bottlenecked by things such as VRAM bandwidth and RAM-VRAM transfers? That's normal and expected.


Replies

spmurrayzzzyesterday at 7:52 PM

No I'm saying there are quite a few more I/O bottlenecks than that. Even in the more efficient training frameworks, there's per-op dispatch overhead in python itself. All the boxing/unboxing of python objects to C++ handles, dispatcher lookup + setup, all the autograd bookkeeping, etc.

All of the bottlenecks in sum is why you'd never get to 100% MFUs (but I was conceding you probably don't need to in order to get value)