logoalt Hacker News

johndoughtoday at 9:08 AM1 replyview on HN

Do you have any clues to guess the total model size? I do not see any limitations to making models ridiculously large (besides training), and the Scaling Law paper showed that more parameters = more better, so it would be a safe bet for companies that have more money than innovative spirit.


Replies

magicalhippotoday at 10:32 AM

> I do not see any limitations to making models ridiculously large (besides training)

From my understanding, the "besides training" is a big issue. As I noted earlier[1], Qwen3 was much better than Qwen2.5, but the main difference was just more and better training data. The Qwen3.5-397B-A17B beat their 1T-parameter Qwen3-Max-Base, again a large change was more and better training data.

[1]: https://news.ycombinator.com/item?id=47089780