I think they are not actually intelligent. Fix all random seeds and other sources of randomness, and try the same prompt twice, and check how intelligent that looks, as a first approximation.
On a more technical level very serious people have voiced doubts, for example Richard Sutton in an interview with Dwarkash Patel [1].
[1] https://m.youtube.com/watch?v=21EYKqUsPfg&pp=ygUnZmF0aGVyIG9...