Surprised models still output tools as text when for ages we’ve been able to constrain the output at the inference engine level and constrain the model what tools, parameters etc are available
Edit: found it, it’s called Grammar-Constrained Decoding (GCD)
Some model providers when using json_schema: true (eg. with_structured_output), it does constrain the output.
constrained decoding tends to make models dumber - this is why it's rarely used
I imagine the challenge comes from recognizing that your model is trying to call a tool before it actually has and only constraining output then. Running a separate pass for an optionally-empty list of tools afterwards may work, but maybe constraining its output like that causes many spurious tool calls.