logoalt Hacker News

buttered_toastyesterday at 10:19 PM1 replyview on HN

Okay I see what you mean, and yeah that sounds reasonable too. Do you have any context on that first part? I would like to know more about how/why they might not have been able to pursue more training runs.


Replies

verdvermyesterday at 10:47 PM

I have not done it myself (don't have the dinero), but my understanding is that there are many runs, restarts, and adjustments at this phase. It's surprisingly more fragile than we know aiui

If you already have a good one, it's not likely much has changed since a year ago that would create meaningful differences at this phase (in data, arch is diff, I know less here). If it is indeed true, it's a datapoint to add to the others singling internal (everybody has some amount of this, not good when it makes the headlines)

Distillation is also a powerful training method. There are many ways to stay with the pack without having new pre-training runs. It's pretty much what we see from all of them with the minor versions. So coming back to it, the speculation is that OpenAi is still on their 4.x pre-train, but that doesn't impede all progress