TLDR The actual (formally) hard problems:
Defining archive boundaries in a dense social graph (graph traversal + stopping criteria without exploding scope)
Entity resolution across pseudonymous accounts
Reconstructing opaque ranking algorithms from outputs