logoalt Hacker News

genxyyesterday at 6:01 PM1 replyview on HN

How good must their training pipelines be? Releasing publicly and at this rate has made them very efficient.


Replies

sleepyeldraziyesterday at 6:10 PM

Finetuning takes little resources, the base model training is the slow and expensive part. Architecturally 3.5 models are identical to their 3.6 counterparts, that is why there is a consensus that those are probably finetunes and not re-trained from scratch, like you will se many people publish their own on huggingface.

show 1 reply