Would it make sense to archive every word every person ever speaks? At what point does archiving everything people do constrain their ability to live freely in the present?
Despite being in written form (decreasingly so), social media feels more like a private conversation in a public space - and like all such conversations, it deserves the right to decay, so that we do not all become prisoners of the dumbest thing we ever said.
The transformative work of curation - choosing which pieces to save, to turn into books, diary entries, or blog posts that record context for posterity - is a valid part of how archivists build the corpus of history. Harvesting all the raw data simply because we can is a dangerous road.
HTTP is not designed for mirroring.
FTP was easy to mirror with "lftp> mirror -p".
Easy mirroring and archive level maintenance (let's say the network always maintain 3 copies at minimum) should be built-in the "social media" protocols.
Well if only we still can archive Instagram full-profiles, for example ...
TLDR The actual (formally) hard problems:
Defining archive boundaries in a dense social graph (graph traversal + stopping criteria without exploding scope)
Entity resolution across pseudonymous accounts
Reconstructing opaque ranking algorithms from outputs
Good that they actually raise the question of users not wanting to be archived. I think the semi-ephemerality of channel based systems like Discord is increasingly popular partly because of various sorts of "cancel wars", well- or ill-intentioned capture and use of posts out of context.