Absolutely agreed. My theory is that the more tools you give the agent to lock down the possible output, the better it will be at producing correct output. My analogy is something like starting a simulated annealing run with bounds and heuristics to eliminate categorical false positives, or perhaps like starting the sieve of eratosthenes using a prime wheel to lessen the busywork.
I also think opinionated tooling is important - for example, the toy language I'm working on, there are no warnings, and there are no ignore pragmas, so the LLM has to confront error messages before it can continue.
It should be impossible for an LLM to generate invalid code, as long as you force it to only generate tokens that the language allows.