You're right but my understanding is that Groq's LPU architecture makes it inference-only ...

ossa-ma • yesterday at 7:47 PM • 1 reply • view on HN

You're right but my understanding is that Groq's LPU architecture makes it inference-only in practice.

Like Groq's chips only have 230MB of SRAM per chip vs 80GB on an H100, training is memory hungry as you need to hold model weights + gradients + optimizer states + intermediate activations.

Replies

refibrillator • yesterday at 8:31 PM

H100 has 80 GB of HBM3. There’s only like 37 MB of SRAM on a single chip.

alt Hacker News

Replies