My initial thought there is that you'd have an imbalance. Many token patterns would almost never come up with the assistant tag on them, for example words with typos in them.