Well, pirated. Piracy and stealing aren't the same thing.
Regardless, I acknowledged the general issue. However I pointed out that doing so was not a technical necessity. If you base your worldview or actions around X implying Y but then it turns out that actually Y was merely a matter of convenience you're probably going to arrive at a wrong conclusion.
There's also the issue where you're emphatically calling it stealing without providing a clear criteria. The legal system as a whole has yet to conclusively resolve the various piracy accusations. The legality of consuming publicly available content remains quite controversial.
It absolutely is a technical necessity. You could build a model from scratch today without doing the same thing. And every model attempting to train on AI generated output degrades into nonsense almost immediately.
There’s a reason Reddit is making millions of dollars letting these companies mine their human generated content. You think OpenAI or anyone else would pay for that if they could just cyclically train on AI generated content???