logoalt Hacker News

sigmoid10today at 1:00 PM0 repliesview on HN

You can cache K and V matrices, but for such huge matrices you'll still pay a ton of compute to calculate attention in the end even if the user just adds a five word question.