Mentioning gaussian splatting for why we don't need lidar depth is a great example of Musk-esque technobabble; surface level seemingly correct, but nonsense to any practitioner. Because one of the biggest problems of all SfM techniques is that the results are scale ambiguous, so they do not in fact recover that crucial real-world depth measurement you get from lidar.
Now you might say "use a depth model to estimate metric depth" and I think if you spend 5 minutes thinking about why a magic math box that pretends to recover real depth from a single 2D image is a very very sketchy proposition when you need it to be correct for emergency braking versus some TikTok bokeh filter you will see that also doesn't get you far.
This is not really true if you have multiple cameras with a known baseline, or well known motion characteristics like you get with an accelerometer+ wheel speed.