Yeah, honestly not too surprising. Happy someone made the experiments though.
I think we know that NN with limited data tends to over-fitting, so to train LoRA you need stronger regularization mechanism, that including:
* Fixing A as projection matrix so it doesn't rotate to an "easier" orientation for B to learn.
* Periodically merging AB into W_tuned to simulate the full-model finetuning behavior.
I think fundamentally, LoRA is sound because gradient matrix is low-rank by its nature.