In this case it is actually OpenAI, the IP (74.7.175.182) is in one of their published ranges (74.7.175.128/25).
https://openai.com/searchbot.json
I don't know if imitating a major crawler is really worth it, it may work against very naive filters, but it's easy to definitively check whether you're faking so it's just handing ammo to more advanced filters which do check.
$ curl -I https://www.cloudflare.com
HTTP/2 200
$ curl -I -H "User-Agent: Googlebot" https://www.cloudflare.com
HTTP/2 403Thanks for looking it up!
I don't have a statistic here, but I'm always surprised how many websites I come across that do limited user-agent and origin/referrer checks, but don't maintain any kind of active IP based tracking. If you're trying to build a site-specific scraper and are getting blocked, mimicking headers is an easy and often sufficient step.