I’ve seen comments saying that many foundational model providers like DeepSeek haven’t done a full p...

SilverElfin • last Wednesday at 5:15 PM • 2 replies • view on HN

I’ve seen comments saying that many foundational model providers like DeepSeek haven’t done a full pretraining in a long time. Does that mean this use of chips is in reference to the past?

Replies

londons_explore • last Wednesday at 5:42 PM

Whilst there aren't many papers on the matter, I would guess that pretraining from scratch is a bit of a waste of money when you could simply expand the depth/width of the 'old' model and retrain only the 'new' bit.

KurSix • last Wednesday at 5:53 PM

Even if they're not doing full-from-scratch training every cycle, any serious model updates still soak up GPU hours

alt Hacker News

Replies