I've experienced something like this at work but with data warehouse instead, and it happened m...

hasyimibhar • yesterday at 6:24 PM • 1 reply • view on HN

I've experienced something like this at work but with data warehouse instead, and it happened multiple times (to be fair, data engineering is still fairly new where I'm from).

One example was an engineer wanted to build an API that accepts large CSV (GBs of credit reports) to extract some data and perform some aggregations. He was in the process of discussing with SREs on the best way to process the huge CSV file without using k8s stateful set, and the solution he was about to build was basically writing to S3 and having a worker asynchronously load and process the CSV in chunks, then finally writing the aggregation to db.

I stepped in and told him he was about to build a data warehouse. :P

Replies

jeffrallen • yesterday at 6:28 PM

If it was less than 100 gb, he probably should have just loaded the whole thing in RAM on a single machine, and processed it all in a single shot. No S3, no network round trips, no chunking, no data warehouse.

alt Hacker News

Replies