Maybe I'm not understanding it (totally possible!) but could Claude just do that by reading standard out and writing to standard in?
Also many CLIs act differently when invoked connected to a terminal (TUI/interactive) vs not. So you’d run into issues there where Claude could only test the non-interactive things.
I had a really hard time getting anything like that to work (you can't just read stdout and write stdin, because you're driving a terminal in raw mode), but it took like 3 sentences worth of Claude prompt to get Claude to use tmux to do this reliably.