No joins in that article?
The comments here smell of "real engineers use command line". But I am not sure they ever actually worked with analysing data more than using it as a log parser.
Yes Hadoop is 2014.
These days you obviously don't set up a Hadoop cluster. You use the cloud provider service provided (BigQuery or AWS Athena for example).
Or map your data into DuckDB or use polars if it is small.
> But I am not sure they ever actually worked with analysing data more than using it as a log parser.
It really feels that way. Real data analysis involves a lot more than just grepping logs. And the reason to be wary of starting out unprepared for that kind of analysis is that migrating to a better solution later is a nightmare.
It depends. I’ve done plenty of data processing, including at large fortune 10s. Most of the big data could be shrunk to small data if you understood the use case— pre-aggregating, filtering to smaller datasets based on known analysis patterns, etc.
Now, you could argue that that’s cheating a bit and introduces preprocessing that is as complex as running Hadoop in the first place, but I think it depends.
In my experience, though, most companies really don’t have big data, and many that do don’t really need to.
Most companies aren’t fortune 500s.
I used to work at Elastic, and I noticed that most (not all!) of the customers who walked up to me at the conferences were there to ask about datasets that easily fit into memory on a cheap VPS.