"Sticks to whatever alignment is imparted" assumes what gets imparted is alignment rather than alignment-performance on the training distribution.
A coherent world model could make a system more consistently aligned. It could also make it more consistently aligned-seeming. Coherence is a multiplier, not a direction.