logoalt Hacker News

andy12_last Sunday at 6:25 PM0 repliesview on HN

Kind-of. You could theoretically use LoRA for this, in fact, but it probably wouldn't have enough capacity to make it a proper substitute of the attention mechanism. Instead a full MLP is trained as input chunks get processed.