logoalt Hacker News

nubglast Sunday at 2:22 PM2 repliesview on HN

Very interesting. Is it correct for me to imagine it as some kind of "LoRA" thats continuously adapted as the model goes through its day?

If so, could there perhaps be a step where the LoRA is merged back into the main model?

That would be like sleeping :-)


Replies

robrenaudlast Sunday at 2:34 PM

I don't think that's a great analogy.

LoRAs tend to be adapters bolted onto to systems by people other than the system designers, and they are low rank factorizations.

There is nothing low rank or adapter here.

andy12_last Sunday at 6:25 PM

Kind-of. You could theoretically use LoRA for this, in fact, but it probably wouldn't have enough capacity to make it a proper substitute of the attention mechanism. Instead a full MLP is trained as input chunks get processed.