They trained it in 33 days for ~20m (that includes apparently not only the infrastructure but also t...

mynti • yesterday at 8:10 AM • 1 reply • view on HN

They trained it in 33 days for ~20m (that includes apparently not only the infrastructure but also the salaries over a 6 month period). And the model is coming close to QWEN and Deepseek. Pretty impressive

Replies

zamadatix • yesterday at 9:40 PM

The price/scaling of training another same class model always seems to be dropping through the floor but training models which score much better seems to be hitting a brick wall.

E.g. gemini-3-pro tops the lmarena text chart today at 1488 vs 1346 for gpt-4o-2024-05-13. That's a win rate of 70% (where 50% is equal chance of winning) over 1.5 years. Meanwhile, even the open weights stuff OpenAI gave away last summer scores between the two.

The exception seems to be net new benchmarks/benchmark versions. These start out low and then either quickly get saturated or hit a similar wall after a while.

➕ show 1 reply

alt Hacker News

Replies