The eval platform is a game changer.
It's nice to have have a solution from OpenAI given how much they use a variant of this internally. I've tried like 5 YC startups and I don't think anyone's really solved this.
There's the very real risk of vendor lock-in but quickly scanning the docs seems like it's a pretty portable implementation.