The bigger the [dense] models the more inference tends to take, it seems pretty linear. In that se...

flockonus • yesterday at 11:29 PM • 0 replies • view on HN

The bigger the [dense] models the more inference tends to take, it seems pretty linear.

In that sense, how long you'd need to wait to get say ~20tk/s .. maybe never.

(save a significant firmware update / translation layer)

alt Hacker News