Well they can make some up, like hallucination. That's an additional problem: when the original site that provided the training data is gone: how can they use verify the AI output to make sure it's correct?