logoalt Hacker News

xyzsparetimexyztoday at 12:31 PM1 replyview on HN

Single core vs multi core accounts for much of this


Replies

cdavidtoday at 12:55 PM

Not really. GPU many cores, at least for fp32, gives you 2 to 4 order of magnitudes compared to high speed CPU.

The rest will be from "python float" (e.g. not from numpy) to C, which gives you already 2 to 3 order of magnitude difference, and then another 2 to 3 from plan C to optimized SIMD.

See e.g. https://github.com/Avafly/optimize-gemm for how you can get 2 to 3 order of magnitude just from C.

show 1 reply