I’d like to know how they chat-tuned it. Getting the base model is one thing, did they also make a bunch of conversations for SFT and if so how was it done?
We develop chatbots while minimizing interference with the normative judgments acquired during pretraining (“uncontaminated bootstrapping”).
So they are chat tuning, I wonder what “minimizing interference with normative judgements” really amounts to and how objective it is.You could extract quoted speech from the data (especially in Q&A format) and treat that as "chat" that the model should learn from.
They have some more details at https://github.com/DGoettlich/history-llms/blob/main/ranke-4...
Basically using GPT-5 and being careful