So, in layman’s terms, LoRa appears to “traumatize “ the model to some degree, connecting the vector space with strong “jumpers” (intruder dimensions) to change it’s behavior, instead of subtly conforming the entire model into a shape that accommodates the new data.
These jumpers or shortcuts do create connections between the relevant new concepts in the model, but by directly connecting them instead of associating them through the existing network of concepts, nuance is lost and the bypassed areas become deemphasized, leading to forgetting of previously held associations.
Because of this, In general, fine tuning produces better results than LoRa in most cases, especially when forgetting of existing training is detrimental.
Or, to further oversimplify the issue in SE terms, LoRa == monkeypatching. (Is this a kind of intruder dimension?)
How does it compare to partially fine-tuning the model by freezing most of the network beside the last few layers?
Thank you for this layman explanation
I wonder how this compares to 'catastrophic forgetting' that can be a problem of full fine tuning. Or at least that's what I've just been reading as a case _for_ using LoRa, as it's not susceptible to that. I guess this paper shows LoRa causes forgetting in a different way.
Are there good general principles yet for what fine tuning method to use in certain situations? It still seems quite difficult to know ahead of time what's going to happen.