I've just written my question to the nginx community forum, after a lengthy debugging session with multiple LLMs. Right now, I believe it was the combination of multi_accept + open_file_cache > worker_rlimit_nofile.
https://community.nginx.org/t/too-many-open-files-at-1000-re...
Also, the servers were doing 200 Mbps, so I couldn't have kept up _much_ longer, no matter the limits.
One thing that might work for you is to actually make the empty tile file, and hard link it everywhere it needs to be. Then you don't need to special case it at runtime, but instead at generation time.
NVMe disks are incredibly fast and 1k rps is not a lot (IIRC my n100 seems to be capable of ~40k if not for the 1 Gbit NIC bottlenecking). I'd try benchmarking without the tuning options you've got. Like do you actually get 40k concurrent connections from cloudflare? If you have connections to your upstream kept alive (so no constant slow starts), ideally you have numCores workers and they each do one thing at a time, and that's enough to max out your NIC. You only add concurrency if latency prevents you from maxing bandwidth.
> so I couldn't have kept up _much_ longer, no matter the limits.
Why would that kind of rate cause a problem over time?
I'm pretty sure your open file cache is way too large. If you're doing 1k/sec, and you cache file descriptors for 60 minutes, assuming those are all unique, that's asking for 3 million FDs to be cached, when you've only got 1 million available. I've never used nginx or open_file_cache[1], but I would tune it way down and see if you even notice a difference in performance in normal operation. Maybe 10k files, 60s timeout.
> Also, the servers were doing 200 Mbps, so I couldn't have kept up _much_ longer, no matter the limits.
For cost reasons or system overload?
If system overload ... What kind of storage? Are you monitoring disk i/o? What kind of CPU do you have in your system? I used to push almost 10GBps with https on dual E5-2690 [2], but it was a larger file. 2690s were high end, but something more modern will have much better AES acceleration and should do better than 200 Mbps almost regardless of what it is.
[1] to be honest, I'm not sure I understand the intent of open_file_cache... Opening files is usually not that expensive; maybe at hundreds of thousands of rps or if you have a very complex filesystem. PS don't put tens of thousands of files in a directory. Everything works better if you take your ten thousand files and put one hundred files into each of one hundred directories. You can experiment to see what works best with your load, but a tree where you've got N layers of M directories and the last layer has M files is a good plan, 64 <= M <= 256. The goal is keeping the directories compact so searching and editing is effective.
[2] https://www.intel.com/content/www/us/en/products/sku/64596/i...