logoalt Hacker News

londons_exploretoday at 7:28 AM2 repliesview on HN

So why only 30,000 tokens per second?

If the chip is designed as the article says, they should be able to do 1 token per clock cycle...

And whilst I'm sure the propagation time is long through all that logic, it should still be able to do tens of millions of tokens per second...


Replies

wmftoday at 8:27 AM

You still need to do a forward pass per token. With massive batching and full pipelining you might be able to break the dependencies and output one token per cycle but clearly they aren't doing that.

show 1 reply
menaerustoday at 8:46 AM

Reading from and to memory alone takes much more than a clock cycle.