The last thing a proper benchmark should do is reveal it's own API key.
IMO it should need a third party running the LLM anyway. Otherwise the evaluated company could notice they're receiving the same requests daily and discover benchmarking that way.
That's a good thought I hadn't had, actually.