> Writing code is the default behavior from pre-training
what does this even mean? could you expand on it
During pre-training the model is learning next-token prediction, which is naturally additive. Even if you added DEL as a token it would still be quite hard to change the data so that it can be used in a mext-token prediction task Hope that helps
He means that it is heavily biased to write code, not remove, condense, refactor, etc. It wants to generate more stuff, not less.
During pre-training the model is learning next-token prediction, which is naturally additive. Even if you added DEL as a token it would still be quite hard to change the data so that it can be used in a mext-token prediction task Hope that helps