logoalt Hacker News

throwaway4847610/02/20242 repliesview on HN

The problem is that website owners want to have their cake and eat it too. They want to make data public but not so public that it can be copied. It's the same problem as DRM which doesn't work. It's an inherent contradiction.

Web devs also bloat the hell out of sites with MB of Javascript and overcomplicated design. It would be far cheaper to just have a static site and use CDN.


Replies

imiric10/02/2024

CAPTCHAs can protect public resources as well. But the main problem here is about preventing generated spam content, not scraping. This can be mitigated by placing CAPTCHAs only on pages with signup/login and comment forms.

show 1 reply
kamray2310/02/2024

That'd be a nice way of looking at it, if serving content was cheap. It is not. I want to put my CV online, but I'm not willing to shill out tens of thousands every year to have it scraped for gigabytes per day. Doesn't happen, you say? Didn't before, definitely. Now there's so many scrapers building data sets that I've certainly had to block entire ranges of IPs due to repeated wasting of money.

It's like the classic "little lambda thing" that someone posts on HN and finds a $2 million invoice in their inbox a couple weeks later. Except instead of going viral your achievements get mulched by AI.

show 2 replies