The frontier model developers have licensed access to a huge volume of training data which isn't available on the public WWW.