logoalt Hacker News

luckystarrlast Saturday at 8:33 PM0 repliesview on HN

I remember vaguely that this was observed when training GPT-3 (probably?) as well. Just trained on and on, and the error went up and then down again. Like a phase transition in the model.