> but monocular depth estimation was spectacularly good by 2021
It's still rather weak and true monocular depth estimation really wasn't spectacularly anything in 2021. It's fundamentally ill posed and any priors you use to get around that will come to bite you in the long tail of things some driver will encounter on the road.
The way it got good is by using camera overlap in space and over time while in motion to figure out metric depth over the entire image. Which is, humorously enough, sensor fusion.
It was spectacularly good before 2021, 2021 is just when I noticed that it had become spectacularly good. 7.5 billion miles later, this appears to have been the correct call.