Intuitively this feels obvious. Content generated by the model will be shaped by its training, therefore when reading it back it will resonate with that same training and have a positive view as a result.
Human when preparing a CV: "Make my CV more professional"
LLM many days later presenting a report to HR: "This CV is really professional"
There's probably more to it than that of course.
But it justifies my personal policy of using a different LLM family for code review tasks than for code generation tasks. To avoid the "marking your own homework" problem.
And not in human-interpretable ways. An LLM was told to behave in a certain way and then output random numbers. When the numbers were pasted to another LLM instance, it also behaved that way. I wish I remembered more about that study or had a link to it - it was fascinating.