A superior alternative to standard Muon and AdamW optimizers for training large models.
Fantastic work, instantly valuable, immediately usable.
A big THANK YOU to the authors:
Jack Zhang, Noah Amsel, Berlin Chen, and Tri Dao