> and are trying to justify it in reverse
In split-brain experiments this is exactly how one half of the brain retroactively justifies the action of the other half. Maybe it is the case in LLMs that an overpowered latent feature sets the overall direction of the "thought" and then inference just has to make the best of it.