I don’t want to discredit the authors but just want to offer couple of hypothetical points in these paranoid times.
From a marketing angle, for a startup whose product is an AI security tool, buying zero-days from black market and claiming the AI tool found them might be good ROI. After all this is making waves.
Or, could it be possible the training set contains zero-day vulnerabilities known to three-letter agencies and other threat actors but not to public?
These two are not mutually exclusive either. You could buy exploits and put them in the training set.
I would not be surprised if it is legit though.
To your second point - why would you need this? There are _plenty_ of previously found CVEs to train on.
Also, I don't think the three letter agencies would share one of the most prized assets they have...