I think separating thinking tokens from "representing" tokens might be a better approach, like what those thinking models does