logoalt Hacker News

throwaway19884601/15/20261 replyview on HN

I lately used these methods and BFGS worked better than CG for me.


Replies

hodgehog1101/16/2026

Absolutely plausible (BFGS is awesome), but this is situation dependent (no free lunch and all that). In the context of training neural networks, it gets even more complicated when one takes implicit regularisation coming from the optimizer into account. It's often worthwhile to try a SGD-type optimizer, BFGS, and a Newton variant to see which type works best for a particular problem.