Jeff Dean has a paper in 2007 that has proto scaling law plots for ngram language models.

ekelsen • today at 6:02 AM • 1 reply • view on HN

https://aclanthology.org/anthology-files/anthology-files/pdf...

Replies

Nice find! The final paragraph of the Conclusion is amazingly prescient!

"Significantly, we found that translation quality as indicated by BLEU score continues to improve with increasing language model size, at even the largest sizes considered. This finding underscores the value of being able to train and apply very large language models, and suggests that further performance gains may be had by pursuing this direction further."

alt Hacker News

Replies