LLMs don't "know" anything. But as you say, they can identify correlations between content "porn" and a target image; between content labeled "children" and a target image. If a target image scores high in both, then it can flag child porn, all without being trained on CSAM.
But things correlated with porn != porn and things correlated with children != children. For example, in our training set, no porn contains children, so the presence of children would mean it's not porn. Likewise all images of children are clothed, so no clothes means it's not a child. You know it's ridiculous because you know things, the AI does not.
Nevermind the importance of context, such as distinguishing a partially clothed child playing on a beach from a partially clothed child in a sexual situation.