One thing that might work for you is to actually make the empty tile file, and hard link it everywhere it needs to be. Then you don't need to special case it at runtime, but instead at generation time.
NVMe disks are incredibly fast and 1k rps is not a lot (IIRC my n100 seems to be capable of ~40k if not for the 1 Gbit NIC bottlenecking). I'd try benchmarking without the tuning options you've got. Like do you actually get 40k concurrent connections from cloudflare? If you have connections to your upstream kept alive (so no constant slow starts), ideally you have numCores workers and they each do one thing at a time, and that's enough to max out your NIC. You only add concurrency if latency prevents you from maxing bandwidth.
Yes, that's a good idea. But we are talking about 90+% of the titles being empty (I might be wrong on that), that's a lot of hard links. I think the nginx config just need to be fixed, I hope I'll receive some help on their forum.