They add new data to the existing base model via continuous pre-training. You save on pre-training, ...

rockinghigh • yesterday at 7:46 PM • 1 reply • view on HN

They add new data to the existing base model via continuous pre-training. You save on pre-training, the next token prediction task, but still have to re-run mid and post training stages like context length extension, supervised fine tuning, reinforcement learning, safety alignment ...

Replies

astrange • today at 12:15 AM

Continuous pretraining has issues because it starts forgetting the older stuff. There is some research into other approaches.

alt Hacker News

Replies