logoalt Hacker News

QuadmasterXLIIyesterday at 11:17 PM0 repliesview on HN

you can’t directly compare losses because they changed the data distribution for each phase ( I think. 100% guaranteed they change the data distribution after the 10 trillion token mark, that’s when they start adding in instruction following data, but I don’t know for sure if the other phase changes also include data distribution changes.)