In the last few days I was recommending to read the insights from XBOW [1], it's a competitor but it adds more information to the discussion.
[1] https://xbow.com/blog/mythos-offensive-security-xbow-evaluat...
That is a good article.
Interesting that gpt-5.5, while not as good as mythos, also seems like a decent step up
Thanks for sharing. Its definitely more concrete. Some of the things that I was hoping to find were, the number of false positives, the times it takes to identify the false positives from real ones, the taxation on human mind to perform this exercise. Did anyone manually verified the exploits which were identified by the LLM or were they assumed correct based on the explanation. I do understand that the target audience of these articles is probably the decision makers so the language and content has to be tailored accordingly.