logoalt Hacker News

acousticsyesterday at 6:09 PM0 repliesview on HN

The number of tokens trained on is separate from the model size.

Gemma 3 270M was trained on 6 trillion tokens but can be loaded into a few hundred million bytes of memory.

But yeah GPT-4 is certainly way bigger than 45GB.