logoalt Hacker News

liuliu11/08/20240 repliesview on HN

Yeah, honestly not too surprising. Happy someone made the experiments though.

I think we know that NN with limited data tends to over-fitting, so to train LoRA you need stronger regularization mechanism, that including:

* Fixing A as projection matrix so it doesn't rotate to an "easier" orientation for B to learn.

* Periodically merging AB into W_tuned to simulate the full-model finetuning behavior.

I think fundamentally, LoRA is sound because gradient matrix is low-rank by its nature.