The wonderful thing though is that you can just run the model multiple times (even in parallel). Some instances might get stuck but as long as some find the bug and you have a good way to filter outputs (e.g. with another llm that tries to create concrete exploits) even a very small success rate on stage 1 can lead to reliable exploits