Anchor text information is arguably a better source for relevance ranking in my experience.
I publish exports of the ones Marginalia is aware of[1] if you want to play with integrating them.
[1] https://downloads.marginalia.nu/exports/ grab 'atags-25-04-20.parquet'
Very interesting, and it is very kind of you to share your data like that. Will review!
Though I'd think that you'd want to weight unaffiliated sites' anchor text to a given URL much higher than an affiliated site.
"Affiliation" is a tricky term itself. Content farms were popular in the aughts (though they seem to have largely subsided), firms such as Claria and Gator. There are chumboxes (Outbrain, Taboola), and of course affiliate links (e.g., to Amazon or other shopping sites). SEO manipulation is its own whole universe.
(I'm sure you know far more about this than I do, I'm mostly talking at other readers, and maybe hoping to glean some more wisdom from you ;-)