You could duplicate every token and reserve the duplicates exclusively for the chain-of-thought, which could be robustly filtered from user input. Basically adding a "thought" bit to each token.