There is always one party "in control" of the lexicon and its canonical version.
I think it's important to distinguish this from the "every client adds their own features" thing. Technically yes, each app can add their own things to the open union that they support better. But it's also on each implementer's to consider how this would affect UX in other clients (e.g. if you add your own embed type, it seems reasonable to also prepopulate a link embed that acts as fallback). The problems you're describing are real, but I think we should give a bit more credit to the app builders since they're also aware that this is a part of their user experience design.
But still, whoever "owns" the lexicon says what's canonical. Then yes, some other software might not catch up to what's canonical but that's similar to what's happening with any platform that supports multiple clients today. Unless your outlook is that alternative clients in general are not competitive for this reason. I think that's a grim outlook, and if that were true, services wouldn't go to extra lengths to intentionally shut down their APIs, which so has been the trend with every network.
I think in longer term the bet is that the benefits unlocked by interop and a more competitive product landscape will become clearer to end users, who will be less interested in joining closed platforms and will develop some intuitions around that. This would not happen soon, so until then, the bet is that interop will allow creating better products. And if that doesn't happen, yes, it's pretty hard for open to compete.
Well, I never personally formed a strong opinion on Moxie's take, although I do understand it. Basically yes his outlook is that any service that doesn't actively ban alternative clients will be outcompeted by those that do.
The reason is that if alt clients are possible then some fraction of the userbase will adopt them. And if some users adopt them that means the experience of other users of the service gets worse, because new features become unreliable and flaky. You think you understand what another person sees and can do, but you don't, and this leads to poor experiences.
Viewed another way the client is an integrated part of the platform and not something that you can enable users to change freely, any more than they could be allowed to change the servers freely. We don't allow the latter because users would do things that broke the service, and so it is also for the former.
Empirically Moxie seems to be correct. Of the 90s era open protocols the only ones that survive are the web and email. The web survives but it's never been truly open in an extension sense - the definition of HTML has always been whatever the dominant browser of the era accepts, and this has become moreso over the time not less. There are no more plugin APIs for instance. SMTP survives, barely, because it's the foundation of internet identity. But many people even in corporate contexts now never send an email. It's all migrated to Slack or Teams. And if you look carefully at the Slack API it's not possible to make a truly complete alternative client with it, the API is only intended for writing bots.
This is grim but I'm not sure it's false and I'm not sure it can be changed. Also, Moxie's essay ends on a positive note. He observes that competition between mobile social networks does still work well despite the lack of federation, because they coalesced around using the user's phone number as identity and address book as the friends list, so you can in fact port your social network to a different network just by signing up. The notification center in the OS provides the final piece of the puzzle, acting as a unified inbox that abstracts the underlying social network.
This is rather mobile specific but seems basically correct to me. So that suggests the key pillar isn't file formats or protocols but ownable identity. It works because telcos do the hard work of issuing portable identities and helping people keep them, and ownership can be swiftly verified over the internet.