logoalt Hacker News

sorenjan11/08/20241 replyview on HN

> We randomly initialize A such that it has singular values of 1, freeze it, and only train B. When we do this, we see a sharp reduction in high ranking intruder dimensions in comparison to those in normal LoRA

This sounds interesting, but I can't see that they do much with this result. Are they saving it for a follow up paper? I would think that if their whole paper is about a big problem with LoRAs and they then find what looks like an easy solution for that problem that would warrant more than a paragraph just before the conclusion.

It would also have been interesting if they included the DoRA method, they reference it briefly and that paper claims to resemble fine tuning learning behavior.

But perhaps this paper is focused on LoRA behavior, and a separate paper comparing various improvements is better.


Replies

liuliu11/08/2024

Yeah, honestly not too surprising. Happy someone made the experiments though.

I think we know that NN with limited data tends to over-fitting, so to train LoRA you need stronger regularization mechanism, that including:

* Fixing A as projection matrix so it doesn't rotate to an "easier" orientation for B to learn.

* Periodically merging AB into W_tuned to simulate the full-model finetuning behavior.

I think fundamentally, LoRA is sound because gradient matrix is low-rank by its nature.