Thousands of systems, from Google to script kiddies to OpenAI to nigerian call scammers to cybersecurity firms, actively watch the certificate transparency logs for exactly this reason. Yawn.
The Web Archive also uses the Certificate Transparency logs, some websites that aren't linked anywhere end up in the Wayback Machine this way: https://archive.org/details/certificate-transparency?tab=abo...
With that said, given that (1) pre-certificates in the log are big and (2) lifetimes are shortening and so there will be a lot of duplicates, it seems like it would be good for someone to make a feed that was just new domain names.
"... for exacty this reason."
Needs clarification. What reason
Certificate transparency log is a Google project. They don’t need to scrape it. They host all the data. It’s one of those projects where Google hosts it because it thinks it genuinely improves the internet, by reducing certificate authority abuse.
For those that never looked at the CT logs: https://crt.sh/?q=ycombinator.com
(the site may occasionally fail to load)