Because math. The data that would be necessary to train an LLM to break (properly) encrypted information would be indistinguishable from random bytes.
How do you train a model when the input has no apparent correlation to the output ?