The number of tokens trained on is separate from the model size.
Gemma 3 270M was trained on 6 trillion tokens but can be loaded into a few hundred million bytes of memory.
But yeah GPT-4 is certainly way bigger than 45GB.