(2021), still very interesting. Especially the "post-overfitting" training strategy is une...

xg15 • last Saturday at 4:54 PM • 2 replies • view on HN

(2021), still very interesting. Especially the "post-overfitting" training strategy is unexpected.

Replies

dev_hugepages • yesterday at 6:32 AM

This is talking about the double descent phenomenon (https://en.wikipedia.org/wiki/Double_descent)

luckystarr • last Saturday at 8:33 PM

I remember vaguely that this was observed when training GPT-3 (probably?) as well. Just trained on and on, and the error went up and then down again. Like a phase transition in the model.

alt Hacker News

Replies