logoalt Hacker News

ollieprotoday at 1:01 PM3 repliesview on HN

This would likely only get used for small finetuning jobs. It’s too slow for the scale of pretraining.


Replies

onion2ktoday at 1:37 PM

It’s too slow for the scale of pretraining.

There isn't really such a thing as 'too slow' as an objective fact though. It depends on how much patience and money for electricity you have. In AI image gen circles I see people complaining if a model takes more than 5s to generate an image, and other people on very limited hardware who happily wait half an hour per image. It's hard to make a judgement call about what 'too slow' means. It's quite subjective.

show 2 replies
greenavocadotoday at 1:46 PM

So distribute copies of the model in RAM to multiple machines, have each machine update different parts of the model weights, and sync updates over the network

show 1 reply