Are you tunnel visioning on detecting outputs of a technology using the same fundamental technology? It seems like that would be an inescapable product-risk that I would bet against.
If you discover the best AI detection technology, then it will eventually be used as a (negative) reward function in training future models. It's an arms race.
Would you be open to alternative techniques, like metadata, source analysis, or online (fuzzy) reverse searching? Consider that judges and court experts don't rely solely on the evidence itself to determine if it's 'real', but to an even greater extent they rely on how it was collected, if evidence has not been sourced correctly, judges don't even look at it.
Detection is one layer, not the whole solution. We've also built an AI model that inserts invisible watermarks for content provenance.
We've also been doing active work in source tracing: https://github.com/piotrkawa/audio-deepfake-source-tracing