If datasets are what we are talking about, I'd like to bring attention to the biological datasets out there that have yet to be fully harnessed.
The ability to collect gene expression data at a tissue specific level has only been invented and automated in the last 4-5 years (see 10X Genomics Xenium, MERFISH). We've only recently figured out how to collect this data at the scale of millions of cells. A breakthrough on this front may be the next big area of advancement.