I think it'd be very hard to achieve viable tokens/s or get arithmetic intensity to be hig...

joshuamoyers • today at 4:59 PM • 0 replies • view on HN

I think it'd be very hard to achieve viable tokens/s or get arithmetic intensity to be high enough in general, since many cases in existing training and inference are memory bandwidth limited. Definitely seems possible to conceptually have a slow pipeline that is distributed though.

alt Hacker News