They have not, every successful pre-train as of late has had performance increases greater than what...

solenoid0937 • today at 1:23 AM • 1 reply • view on HN

They have not, every successful pre-train as of late has had performance increases greater than what the scaling laws predict.

0x3f • today at 1:46 AM

Those gains are arch based, data quality based, etc. Scaling laws only relate to data volume and compute, holding other factors constant.

alt Hacker News