I don't really understand why this is so hard or why it wasn't just done from the get go.
Just have Apple and Google digitally sign videos and photos recorded from phones and then have Google and Meta, etc display that they are authentic when shown on their platforms.
You're talking about the metadata of the files, which can always be edited and someone will inevitably try to make software to do exactly that. Also, Adobe's proposal for handling generated content is exactly this and they're not able to get buy-in from other companies.
It becomes a hard problem quickly when you introduce editing, and most photos and videos on social media are edited. I'm not sure how it would work. It seems more feasible than universal watermarks, though.
It's pretty much impossible to do this in a useful way, _and_ it would also cement even more control over the media landscape to those companies.