> How did these 2 models do it if not actually using language like a thinking agent?
By having a gazillion of other, almost identical pictures of kids in parks in their training data.