logoalt Hacker News

alansaberyesterday at 3:33 PM0 repliesview on HN

To caveat, smaller batch sizes are generally better for model stability, but we go bigger because it substantially speeds up training