logoalt Hacker News

vladftoday at 2:33 AM0 repliesview on HN

That still looks like a “converge faster” paper.

https://arxiv.org/abs/2006.10732

The above provides a nuanced theoretical view. GD inductive bias is probably better unless your model is misspecified