Sorry, I tought you meant "support infrastructure" in a much wider sense — yeah, LLMs are frighteningly good at lockpicking tests using source code shaped inputs. It's just that they are also frighteningly good at finding insane ways to game the tests, too. I wouldn't say that LLMs are very "G" in the AI they do — present them with confusing semantics, and they fall off the self-contradiction cliff. No capability of developing theory systematically from observations.