Even if they haven't added any cache control headers, what kind a of lazy Meta engineer designed their crawler with to just pull the same URL multiple times a second?
Is this where all that hardware for AI projects is going? To data centers that just uncritically hits the same URL over and over without checking if the content of a site or page has chanced since the last visit then and calculate a proper retry interval. Search engine crawlers 25 - 30 years ago could do this.
Hit the URL once per day, if it chances daily, try twice a day. If it hasn't chanced in a week, maybe only retry twice per week.
Forgejo does set "cache-control: private, max-age=21600", which is considerably more than one second, but I grant it uses the "private" keyword for no reason here.
Even if they haven't added any cache control headers, what kind a of lazy Meta engineer designed their crawler with to just pull the same URL multiple times a second?
Is this where all that hardware for AI projects is going? To data centers that just uncritically hits the same URL over and over without checking if the content of a site or page has chanced since the last visit then and calculate a proper retry interval. Search engine crawlers 25 - 30 years ago could do this.
Hit the URL once per day, if it chances daily, try twice a day. If it hasn't chanced in a week, maybe only retry twice per week.