This is absolutely a new type of nondeterministic tool, so you're spot on there.
One of the key things we realized starting to use it is that the approach allows you to mix deterministic and non-deteministic tools together as part of a composable chain.
So you can, for example, use LLMs for their evaluation capabilities with a natiural language script as part of a broader chain that wraps it in deterministic code, and that also can include and run deterministic code nested within the plain language script.
So it allows us to create pipelines that combine the best of both approaches as appropriate based on the sub-task at hand.
If you mix deterministic and nondeterministic, then the result is nondeterministic.
Which means your entire pipeline is tainted.
If your process is fine with that, whatever, but don't pretend that the result can be controlled.