logoalt Hacker News

fifilurayesterday at 2:30 PM2 repliesview on HN

No joins in that article?

The comments here smell of "real engineers use command line". But I am not sure they ever actually worked with analysing data more than using it as a log parser.

Yes Hadoop is 2014.

These days you obviously don't set up a Hadoop cluster. You use the cloud provider service provided (BigQuery or AWS Athena for example).

Or map your data into DuckDB or use polars if it is small.


Replies

christophilusyesterday at 11:50 PM

It depends. I’ve done plenty of data processing, including at large fortune 10s. Most of the big data could be shrunk to small data if you understood the use case— pre-aggregating, filtering to smaller datasets based on known analysis patterns, etc.

Now, you could argue that that’s cheating a bit and introduces preprocessing that is as complex as running Hadoop in the first place, but I think it depends.

In my experience, though, most companies really don’t have big data, and many that do don’t really need to.

Most companies aren’t fortune 500s.

I used to work at Elastic, and I noticed that most (not all!) of the customers who walked up to me at the conferences were there to ask about datasets that easily fit into memory on a cheap VPS.

show 1 reply
ziml77yesterday at 3:47 PM

> But I am not sure they ever actually worked with analysing data more than using it as a log parser.

It really feels that way. Real data analysis involves a lot more than just grepping logs. And the reason to be wary of starting out unprepared for that kind of analysis is that migrating to a better solution later is a nightmare.

show 1 reply