yes! typically the optimizer that trains faster also get better data efficiency. it maybe not be abs...

sdpmas • today at 12:35 AM • 2 replies • view on HN

yes! typically the optimizer that trains faster also get better data efficiency. it maybe not be absolutely true, but that has been my observation so far. also see https://arxiv.org/pdf/2510.09378 for second-order methods.

Replies

vladf • today at 2:33 AM

That still looks like a “converge faster” paper.

https://arxiv.org/abs/2006.10732

The above provides a nuanced theoretical view. GD inductive bias is probably better unless your model is misspecified

alyxya • today at 1:21 AM

Fundamentally I don't believe second-order methods get better data efficiency by itself, but changes to the optimizer can because the convergence behavior changes. ML theory lags behind the results in practice.

alt Hacker News

Replies