logoalt Hacker News

six_four_eight11/08/20241 replyview on HN

I wonder how this compares to 'catastrophic forgetting' that can be a problem of full fine tuning. Or at least that's what I've just been reading as a case _for_ using LoRa, as it's not susceptible to that. I guess this paper shows LoRa causes forgetting in a different way.

Are there good general principles yet for what fine tuning method to use in certain situations? It still seems quite difficult to know ahead of time what's going to happen.


Replies

K0balt11/09/2024

Catastrophic forgetting or “psychosis” seems to happen when I overtrain. It’s easy to make it happen to models that have been extensively tuned already, but the base models hold up much better. I’m pretty sure there is a point in the n-dimensional space where x discrete vectors with n dimensions stops encoding usefully distinct patterns.