The closed nature is one thing, but the opaque billing on reasoning tokens is the real dealbreaker for integration. If you are bootstrapping a service, I don't see how you can model your margins when the API decides arbitrarily how long to think and bill for a prompt. It makes unit economics impossible to predict.
FYI: Newer LLM hosting APIs offer control over amount of "thinking" (as well as length of reply) -- some by token count others by an enum (high low, medium, etc.).
You just have to plan for the worst case.
Doesn't ClosedAI do the same? Thinking models bill tokens, but the thinking steps are encrypted.