logoalt Hacker News

sneheshtyesterday at 12:36 PM7 repliesview on HN

Why not simply blacklist or rate limit those bot IP’s ?


Replies

xprnioyesterday at 1:17 PM

If you have real traffic and bot traffic, you still need to identify which is which. On top of that, bots very likely don’t reuse the same IPs over and over again. I assume if we knew all the IPs used only by bots ahead of time, then yeah it would be simple to blacklist them. But although it’s simple in theory, the practice of identifying what to blacklist in the first place is the part that isn’t as simple

show 1 reply
phyzomeyesterday at 1:10 PM

Because punishment for breaking the robots.txt rules is a social good.

Benderyesterday at 5:48 PM

Why not simply blacklist or rate limit those bot IP’s ?

Many bots cycle through short DHCP leases on LTE wifi devices. One would have to accept blocking all cell phones which I have done for my personal hobby crap but most businesses will not do this. Another big swath of bots come from Amazon EC2 and GoogleCloud which I will also happily block on my hobby crap but most businesses will not.

Some bots are easier to block as they do not use real web clients and are missing some TCP/IP headers making them ultra easy to block. Some also do not spoof user-agent and are easy to block. Some will attempt to access URL's not visible to real humans thus blocking themselves. Many bots can not do HTTP/2.0 so they are also trivial to block. Pretty much anything not using headless Chrome is easy to block.

aduwahyesterday at 1:04 PM

There are way too many to do that

show 1 reply
arbolyesterday at 2:12 PM

The AI companies are using virtually unlimited "clean" residential IPs so this is not a valid strategy.

show 1 reply
nextlevelwizardyesterday at 5:52 PM

Point is to kill or at least hinder AI progress

xyzalyesterday at 3:37 PM

For the lulz