Have a look at this article: https://www.washingtonpost.com/technology/interactive/2023/a...
NY Times is 0.06% of common crawl.
These news media outlets provide a drop in the ocean worth of information. Both qualitatively and quantitatively.
The news / media industry is really just trying to hold on to their lifeboat before inevitably becoming entirely irrelevant.
(I do find this sad, but it is like the reality - I can already now get considerably better journalism using LLMs than actual journalists - both click bait stuff and high quality stuff)
90% of common crawl is complete junk. While the tiny bit of news articles powers almost all the ai answers in Google search.
How many Reddit, HN, etc. posts are based on NYT articles? How many derivative news articles, blog posts, YouTube videos, TikToks, etc. are responses to those articles?
At least NYT is probably on the correct side of Sturgeon’s Law: https://en.wikipedia.org/wiki/Sturgeon%27s_law
0.06% is way higher than I would expect
That seems like a reductive way to consider it. What percent of music was created by Led Zeppelin? What percent of art was painted by Monet? What percent of films by Alfred Hitchcock? It may be a small percentage objectively but they are hugely influential.