logoalt Hacker News

arendtioyesterday at 4:02 PM2 repliesview on HN

> By scaling up model parameters and leveraging substantial computational resources

So, how large is that new model?


Replies

marcd35yesterday at 6:13 PM

While Qwen2.5 was pre-trained on 18 trillion tokens, Qwen3 uses nearly twice that amount, with approximately 36 trillion tokens covering 119 languages and dialects.

https://qwen.ai/blog?id=qwen3

show 1 reply
naji_alazharyesterday at 5:29 PM

[dead]