I'm not following the whole LLM space, but > the compute needed to perform matrix multipl...

IsTom • yesterday at 12:59 PM • 2 replies • view on HN

I'm not following the whole LLM space, but

> the compute needed to perform matrix multiplications goes up as the cube of their size,

are they really not using even Strassen multiplication?

Replies

I'm not aware of any major BLAS library that uses Strassen's algorithm. There's a few reasons for this; one of the big ones is Strassen is much worse numerical performance than traditional matrix multiplication. Another big one is that at very large dense matrices--which are using various flavors of parallel algorithms--Strassen vastly increases the communication overhead. Not to mention that the largest matrices are probably using sparse matrix arithmetic anyways, which is a whole different set of algorithms.

jiggawatts • yesterday at 2:00 PM

AFAIK the best practical matrix multiplication algorithms scale as roughly N^2.7 which is close enough to N^3 to not matter for the point that I'm trying to make.

alt Hacker News

Replies