> This is likely because LLMs are typically trained for only a single epoch over massive datasets...

chaos_emergent • last Sunday at 8:56 PM • 1 reply • view on HN

> This is likely because LLMs are typically trained for only a single epoch over massive datasets, which is in contrast to the multi-hundred-epoch training regimes for which dropout was first introduced.

Wait, is this true? That seems like a wild statement to make, relatively unsubstantiated?

Replies

typon • last Sunday at 8:57 PM

No this is well known. Look for Table 2.2 in GPT3 paper.

alt Hacker News

Replies