Yeah, honestly not too surprising. Happy someone made the experiments though. I think we kn...

liuliu • 11/08/2024 • 0 replies • view on HN

Yeah, honestly not too surprising. Happy someone made the experiments though.

I think we know that NN with limited data tends to over-fitting, so to train LoRA you need stronger regularization mechanism, that including:

* Fixing A as projection matrix so it doesn't rotate to an "easier" orientation for B to learn.

* Periodically merging AB into W_tuned to simulate the full-model finetuning behavior.

I think fundamentally, LoRA is sound because gradient matrix is low-rank by its nature.

alt Hacker News