What are companies needing all of these hard drives for? I understand their need for memory, and boot. But storing text training data and text conversations isn't that space intensive. There's a few companies doing video models, so I can see how that takes a tremendous amount of space. Is it just that?
Hearing about their scrapping practises it might be that they are storing same data over and over and over again. And then yes, audio and video is likely something they are planning for or already gathering.
And if they produce lot of video, they might keep copies around.
All the latest general purpose models are multimodal (except DeepSeek I think). Transfer learning allows to improve results even after they exhausted all the text in the internet.
Storing training data: for example, Anthropic bought millions of second hand books and scanned them:
https://www.washingtonpost.com/technology/2026/01/27/anthrop...
I think the somewhat hallucinatory canned response is that they distribute data across drives for a massive throughput. Though idk if that even technically makes sense...
I am surprised by that too. I thought everyone moved to SDDs or NVMe ?
I was toying with getting a 2T HDD for a BSD system I have, I guess not now :)
Speaking from personal experience.. we treat cloud storage like an infinitely deep bucket. At rest data efficiency is not really a consideration because compute costs are so absurd. Why worry about a $2M year storage bill when your compute bill is $500M? It’s not worth the engineering time to optimize