logoalt Hacker News

Hard problems in social media archiving

40 pointsby surprisetalklast Thursday at 3:19 PM6 commentsview on HN

Comments

pjc50today at 2:17 PM

Good that they actually raise the question of users not wanting to be archived. I think the semi-ephemerality of channel based systems like Discord is increasingly popular partly because of various sorts of "cancel wars", well- or ill-intentioned capture and use of posts out of context.

show 1 reply
garethspricetoday at 5:00 PM

Would it make sense to archive every word every person ever speaks? At what point does archiving everything people do constrain their ability to live freely in the present?

Despite being in written form (decreasingly so), social media feels more like a private conversation in a public space - and like all such conversations, it deserves the right to decay, so that we do not all become prisoners of the dumbest thing we ever said.

The transformative work of curation - choosing which pieces to save, to turn into books, diary entries, or blog posts that record context for posterity - is a valid part of how archivists build the corpus of history. Harvesting all the raw data simply because we can is a dangerous road.

show 1 reply
zoobabtoday at 10:27 AM

HTTP is not designed for mirroring.

FTP was easy to mirror with "lftp> mirror -p".

Easy mirroring and archive level maintenance (let's say the network always maintain 3 copies at minimum) should be built-in the "social media" protocols.

binarykulttoday at 11:36 AM

Well if only we still can archive Instagram full-profiles, for example ...

CGMthrowawaytoday at 4:43 PM

TLDR The actual (formally) hard problems:

  Defining archive boundaries in a dense social graph (graph traversal + stopping criteria without exploding scope)
  Entity resolution across pseudonymous accounts 
  Reconstructing opaque ranking algorithms from outputs