TLDR: 1. Use alpha = 2*rank
2. Don't use too small ranks (rank=1 to 8)
3. Sensational title. Better title "LoRA works if done right"
4. Didn't test SVD init
Thanks for the TLDR. Yeah, pretty much fits my experience, though I mainly cared about the specific task performance I was training rather than caring about regressing unrelated tasks.
Thanks for the TLDR. Yeah, pretty much fits my experience, though I mainly cared about the specific task performance I was training rather than caring about regressing unrelated tasks.