Gibberish can be the model using contextual embeddings. These are not supposed to Make sense.
Or it could be trying to develop its own language to avoid detection.
The deception part is spooky too. It’s probably learning that from dystopian AI fiction. Which raises the questions if models can acquire injected goals from the training set.