Do you have any clues to guess the total model size? I do not see any limitations to making models r...

johndough • today at 9:08 AM • 1 reply • view on HN

Do you have any clues to guess the total model size? I do not see any limitations to making models ridiculously large (besides training), and the Scaling Law paper showed that more parameters = more better, so it would be a safe bet for companies that have more money than innovative spirit.

Replies

magicalhippo • today at 10:32 AM

> I do not see any limitations to making models ridiculously large (besides training)

From my understanding, the "besides training" is a big issue. As I noted earlier[1], Qwen3 was much better than Qwen2.5, but the main difference was just more and better training data. The Qwen3.5-397B-A17B beat their 1T-parameter Qwen3-Max-Base, again a large change was more and better training data.

[1]: https://news.ycombinator.com/item?id=47089780

alt Hacker News

Replies