logoalt Hacker News

jcimsyesterday at 3:27 PM4 repliesview on HN

Given these are trivially forged, presumably they aren't really using a Mac for scraping, right? Just to elicit a 'standard' end user response from the server?

>useragent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36; compatible; OAI-SearchBot/1.3; robots.txt;


Replies

snowwrestleryesterday at 3:53 PM

Right. Crawler user agent strings in general tend to include all sorts of legacy stuff for compatibility.

This actually is a well-behaved crawler user agent because it identifies itself at the end.

Hrun0yesterday at 3:34 PM

Yes, it is very common to change your useragent for web scraping. Mainly because there are websites which will block you just based on that alone

benjojo12yesterday at 3:32 PM

the ip address the this comes from is a OpenAI search bot range:

> "ipv4Prefix": "74.7.175.128/25"

from https://openai.com/searchbot.json

deathanatosyesterday at 7:50 PM

… the UA is malformed, even.

Makes me want to reconfigure my servers to just drop such traffic. If you can't be arsed to send a well-formed UA, I have doubts that you'll obey other conventions like robots.txt.