> The ones that were still under copyright are a different matter.
Given the sheer volume of information posted to the Internet in the last 40-50 years, I'd wager that covers 80% or more of the relevant input data.
Old text is relatively scarce in the grand scheme of things.
But I have no real clue, just spitballing.