Dataset? That's so 2000s.
Each crawl on the internet is actually a discrete chunk of a more abstractly defined, constant influx of information streams. Let's call them rivers (it's a big stream).
These rivers can dry up, present seasonal shifts, be poisoned, be barraged.
It will never "get there" and gather enough data to "be done".
--
Regarding "new ideas in AI", I think there could be. But this whole thing is not about AI anymore.