logoalt Hacker News

dredmorbiusyesterday at 2:56 PM1 replyview on HN

Oh, that is clever!

I'd also suspect that there are networks / links which are more likely signs of low-value content than others. Off the top of my head, crypto, MLM, known scam/fraud sites, and perhaps share links to certain social networks might be negative indicators.


Replies

marginalia_nuyesterday at 3:10 PM

You can actually identify clusters of websites based on the cosine similarity of their outbound links. Pretty useful for identifying content farms spanning multiple websites.

Have a lil' data explorer for this: https://explore2.marginalia.nu/

Quite a lot of dead links in the dataset, but it's still useful.