logoalt Hacker News

greatgibyesterday at 1:40 PM2 repliesview on HN

[flagged]


Replies

pushcxyesterday at 2:37 PM

The scrapers will not attempt to discover and use an efficient representation. They will attempt to hit every URL they can discover on a site, and they'll do it at a rate of hundreds of hits per second, from enough IPs that each only requests at a rate of 1/minute. It's rude to talk down to people for not implementing a technique that you can't get scrapers to adopt, and for matching their investment in performance to their needs instead of accurately predicting years beforehand that traffic would dramatically change.

xenayesterday at 2:02 PM

I challenge you to take a critical look at the performance of things like PHPBB and see how even naive scraping brings commonly deployed server CPUs to their knees.