logoalt Hacker News

Show HN: Open database of link metadata for large-scale analysis

10 pointsby renegat0x0last Saturday at 4:53 PM1 commentview on HN

I would like to share an open database focused on link-level metadata extraction and aggregation, which may be of interest to researchers.

The project maintains a structured dataset of links enriched with metadata such as:

- page title

- description / summary

- publication date (when available)

- thumbnail / preview image

- etc.

The goal is to provide a reusable, inspectable set of link metadata that can be used for experiments in areas such as:

- RSS and feed analysis

- news analysis

- link rot analysis?

The database is publicly available here:

https://github.com/rumca-js/RSS-Link-Database-2025

There are also databases for previous years


Comments

Aherontasyesterday at 11:59 AM

Curious how you handle feed evolution over time. When an RSS source changes structure (fields added/removed, summaries truncated, etc.), do you normalize to a fixed schema or store the raw payload alongside a best-effort normalized version? Longitudinal datasets tend to get tricky there.