That still looks like a “converge faster” paper.
https://arxiv.org/abs/2006.10732
The above provides a nuanced theoretical view. GD inductive bias is probably better unless your model is misspecified