logoalt Hacker News

moffkalastyesterday at 10:39 AM0 repliesview on HN

> trained from scratch on 80B tokens of historical data

How can this thing possibly be even remotely coherent with just fine tuning amounts of data used for pretraining?