logoalt Hacker News

colechristensentoday at 4:42 PM1 replyview on HN

No, training a state of the art model involves training on the order of 10 trillion tokens.

We're talking about a step that updates weights based on say between 10k and 1M tokens.


Replies

delis-thumbs-7etoday at 4:44 PM

I learned something. Thank you!