To add to the anecdata, today GPT 5.2-whatever hallucinated the existence of two CLI utilities, and when corrected, then hallucinated the existence of non-existent, but plausible, features/options of CLI utilities that do actually exist.
I had to dig through source code to confirm whether those features actually existed. They don't, so the CLI tools GPT recommended aren't actually applicable to my use case.
Yesterday, it hallucinated features of WebDav clients, and then talked up an abandoned and incomplete project on GitHub with a dozen stars as if it was the perfect fit for what I was trying to do, when it wasn't.
I only remember these because they're recent and CLI related, given the topic, but there are experiences like this daily across different subjects and domains.
Yeah, it needs a steady hand on the tiller. However throw together improvements of 70%, -15%, 95%, 99%, -7% across all the steps and overall you're way ahead.
SimonW's approach of having a suite of dynamic tools (agents) grind out the hallucinations is a big improvement.
In this case expressing the feeback validation and investing in the setup may help smooth these sharp edges.
Were you running it inside a coding agent like Codex?
If so then it should have realized its mistake when it tried to run those CLI commands and saw the error message. Then it can try something different instead.
If you were using a regular chat interface and expecting it to know everything without having an environment to try things out then yeah, you're going to be disappointed.