They're trained in a model class likely in 2t to 3t range. It's very unlikely that chinese...

himata4113 • yesterday at 5:27 PM • 3 replies • view on HN

They're trained in a model class likely in 2t to 3t range. It's very unlikely that chinese labs have access to gpu systems capable of training models like that, let alone serving them. This requires proprietary room-scale systems which fetch a huge premium over typical 10 slot systems.

I am sure that they can develop their own equivlient version of such clusters in around 1 year though. Distilling fabel 5 will also go a long way.

Replies

logicprog • yesterday at 5:33 PM

DSv4 is nearly in the 2t range, but yes you're generally right

➕ show 1 reply

axpy906 • yesterday at 9:20 PM

We’ll see it distilled first.

OtomotO • yesterday at 6:33 PM

Ah, American Hubris ... I don't blame you, Hollywood is the world's greatest propaganda machinery of all times.

alt Hacker News

Replies