logoalt Hacker News

londons_explore05/14/20254 repliesview on HN

Surely most AI trawlers have special support for git and just clone the repo once?


Replies

Macha05/14/2025

The AI companies could do work or they could not do work.

They've pretty widely chosen to not do work and just slam websites from proxy IPs instead.

You would think their products would be used by them to do the work if they worked as well as advertised...

ikiris05/15/2025

I think you vastly overestimate the average dev and their care for handling special cases that are mostly other people’s aggregate problem.

show 1 reply
1oooqooq05/15/2025

not if you vibe coded your crawler

NBJack05/15/2025

Apparently, the vibe coding session didn't account for it. /s

I would more readily assume a large social networking company filled with bright minds would have worked out some kind of agreement on, say, a large corpus of copyrighted training data before using it.

It's the wild wild west right now. Data is king for AI training.