We largely agree, we don't think pure LLM-based approaches are sufficient. Having an LLM automatically orchestrate tools, like a software fuzzer, is something we've been thinking about for a while and we view incorporating code execution as the first step.
We think that LLMs are able to capture semantic bugs that traditional software testing cannot find in a hands-off way, and ideas from both worlds will be needed for a truly holistic bug finder.