Given these are trivially forged, presumably they aren't really using a Mac for scraping, right? Just to elicit a 'standard' end user response from the server?
>useragent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36; compatible; OAI-SearchBot/1.3; robots.txt;
Yes, it is very common to change your useragent for web scraping. Mainly because there are websites which will block you just based on that alone
the ip address the this comes from is a OpenAI search bot range:
> "ipv4Prefix": "74.7.175.128/25"
… the UA is malformed, even.
Makes me want to reconfigure my servers to just drop such traffic. If you can't be arsed to send a well-formed UA, I have doubts that you'll obey other conventions like robots.txt.
Right. Crawler user agent strings in general tend to include all sorts of legacy stuff for compatibility.
This actually is a well-behaved crawler user agent because it identifies itself at the end.