That's how night mode works on Pixel phones, right? I believe it takes a few images in rapid succession and took advantage of the noise being random which meant a high quality image under a noisy sensor with some signal processing.
Integrating over a longer time to get more accurate light measurements of the a scene has been a principal feature of photography. You need to slow down the shutter and open up the aperture in dark conditions.
Combining multiple exposures is not significantly different from a single longer exposure, except the key innovation of combining motion data and digital image stabilization which allows smartphones to approximate longer exposures without the need of a tripod.
This is how we reduce noise in filmmaking. My de-noise node in DaVinci has two settings: spatial and temporal. Temporal references 3 frames either side of the subject frame.
some phones shine IR floodlight, too.
It also can actually allow you to identify positions within the image at a greater resolution than the pixels, or even light itself, would otherwise allow.
In microscopy, this is called 'super-resolution'. You can take many images over and over, and while the light itself is 100s of nanometers large, you actually can calculate the centroid of whatever is producing that light with greater resolution than the size of the light itself.
https://en.wikipedia.org/wiki/Super-resolution_imaging