(As someone who worked closely with pathtracing renderers and de-noisers, I think I can answer this :) )
It's mostly because in the VFX/CG space for ray tracing/path tracing de-noisers, they almost always rely on extra outputs/AOVs of things like 'albedo' (diffuse reflectance), normal / world position, etc, to help guide them in many cases.
So they often can 'cheat' a bit, and know where the edges of things are (because say the object ID AOV changes - minus pixel filtering, which complicates things a bit).
They can also 'cheat' in other ways, by mixing back in some of the diffuse texture detail that the denoiser might have removed from the 'albedo' AOV channel.
Cameras don't really have anything to guide them, so they have to guess. And often, they seem to use very primitive methods like bi-lateral filters (or at least things which look very similar), to try and guide them, but it doesn't work very well.
Portrait cameras on phones can use depth sensors a bit to help if the camera has them, but for things like hair strands, it doesn't really work, and is mostly useful for fake-depth-of-field depth-based blurring.
Yeah, but surely ML models would at least work better than analytic algorithms. After all, when looking at a noisy picture, our brain is pretty good at distinguishing detail from noise, so it's not clear to me why an ML model couldn't have denoising performance similar to the human brain, even if it doesn't match the "cheating" denoisers used in ray tracing.