This is assuming the regex is doing a good job. It is not. Also you can embed a very tiny model if you really want to flag as many negatives as possible (I don't know anthropic's goal with this) - it would be quick and free.
I think it's a very reasonable tradeoff, getting 99% of true positives at the fraction of cost (both runtime and engineering).
Besides, they probably do a separate analysis on server side either way, so they can check a true positive to false positive ratio.
I think it's a very reasonable tradeoff, getting 99% of true positives at the fraction of cost (both runtime and engineering).
Besides, they probably do a separate analysis on server side either way, so they can check a true positive to false positive ratio.