logoalt Hacker News

noosphrtoday at 12:38 PM2 repliesview on HN

>We show that a variety of modern deep learning tasks exhibit a "double-descent" phenomenon where, as we increase model size, performance first gets worse and then gets better.


Replies

smallerizetoday at 2:53 PM

Does this mean that if your model is "overfitting", the solution is to train for even more epochs?

ForceBrutoday at 1:43 PM

Right, isn't double descent one of the reasons why modern Extremely Large Language Models work at all? I think I heard somewhere that basically all today's "smart" (reasoning, solving math problems, etc) LLMs are trained in the "double descent" territory (whatever this means, I'm not entirely sure).

show 2 replies