Transformers scale poorly vs. context window size and parameter count. Which means really impressi...

cadamsdotcom • today at 5:18 PM • 1 reply • view on HN

Transformers scale poorly vs. context window size and parameter count.

Which means really impressive when those N’s are small!

I’m but a pundit in this area so don’t know much. But one wonders if there’s a future in burning larger models to FPGAs - whether big enough FPGAs exist (or can be built), and whether locating specialized compute right with the memory it needs can speed things up.

Likely would need a lot of algorithm parallelism work that’d translate back to CPUs/GPUs.

Replies

T-A • today at 5:54 PM

alt Hacker News

Replies