No joins in that article? The comments here smell of "real engineers use command line". ...

fifilura • yesterday at 2:30 PM • 2 replies • view on HN

No joins in that article?

The comments here smell of "real engineers use command line". But I am not sure they ever actually worked with analysing data more than using it as a log parser.

Yes Hadoop is 2014.

These days you obviously don't set up a Hadoop cluster. You use the cloud provider service provided (BigQuery or AWS Athena for example).

Or map your data into DuckDB or use polars if it is small.

Replies

christophilus • yesterday at 11:50 PM

It depends. I’ve done plenty of data processing, including at large fortune 10s. Most of the big data could be shrunk to small data if you understood the use case— pre-aggregating, filtering to smaller datasets based on known analysis patterns, etc.

Now, you could argue that that’s cheating a bit and introduces preprocessing that is as complex as running Hadoop in the first place, but I think it depends.

In my experience, though, most companies really don’t have big data, and many that do don’t really need to.

Most companies aren’t fortune 500s.

I used to work at Elastic, and I noticed that most (not all!) of the customers who walked up to me at the conferences were there to ask about datasets that easily fit into memory on a cheap VPS.

➕ show 1 reply

ziml77 • yesterday at 3:47 PM

> But I am not sure they ever actually worked with analysing data more than using it as a log parser.

It really feels that way. Real data analysis involves a lot more than just grepping logs. And the reason to be wary of starting out unprepared for that kind of analysis is that migrating to a better solution later is a nightmare.

➕ show 1 reply

alt Hacker News

Replies