That still looks like a “converge faster” paper.

vladf • today at 2:33 AM • 0 replies • view on HN

The above provides a nuanced theoretical view. GD inductive bias is probably better unless your model is misspecified

alt Hacker News