logoalt Hacker News

K0balt11/08/20243 repliesview on HN

So, in layman’s terms, LoRa appears to “traumatize “ the model to some degree, connecting the vector space with strong “jumpers” (intruder dimensions) to change it’s behavior, instead of subtly conforming the entire model into a shape that accommodates the new data.

These jumpers or shortcuts do create connections between the relevant new concepts in the model, but by directly connecting them instead of associating them through the existing network of concepts, nuance is lost and the bypassed areas become deemphasized, leading to forgetting of previously held associations.

Because of this, In general, fine tuning produces better results than LoRa in most cases, especially when forgetting of existing training is detrimental.

Or, to further oversimplify the issue in SE terms, LoRa == monkeypatching. (Is this a kind of intruder dimension?)


Replies

six_four_eight11/08/2024

I wonder how this compares to 'catastrophic forgetting' that can be a problem of full fine tuning. Or at least that's what I've just been reading as a case _for_ using LoRa, as it's not susceptible to that. I guess this paper shows LoRa causes forgetting in a different way.

Are there good general principles yet for what fine tuning method to use in certain situations? It still seems quite difficult to know ahead of time what's going to happen.

show 1 reply
ismailmaj11/08/2024

How does it compare to partially fine-tuning the model by freezing most of the network beside the last few layers?

show 1 reply
Mockapapella11/08/2024

Thank you for this layman explanation