logoalt Hacker News

remexreyesterday at 12:16 AM0 repliesview on HN

For each token generated, you only send one token’s worth between layers; the previous tokens are in the KV cache.