logoalt Hacker News

chapstoday at 6:02 PM1 replyview on HN

Yeah, my experiences match yours and I very, very much work with messy data (FOIA data), though I use postgres instead of duckdb.

Most of the datasets I work with are indeed <10GB but the ones that are much larger follow the same ETL and analysis flows. It helps that I've built a lot of tooling to help with types and memory-efficient inserts. Having to rewrite pipelines because of "that one dataframe API" is exactly what solidified my thoughts around SQL over everything else. So much of my life time has been lost trying to get dataframe and non-dataframe libraries to work together.

Thing about SQL is that it can be taken just about anywhere, so the time spent improving your SQL skills is almost always well worth it. R and pandas much less so.


Replies

gofreddygotoday at 6:21 PM

I advocated for a SQL solution at work this week and it seems to have worked. My boss is wary of the old school style SQL databases with their constraints and just being a pain to work with. As a developer, these pains aren't too hard to get used to or automate or document away for me and never understood the undue dislike of sql.

The fact that I can use sqlite / local sql db for all kinds of development and reliably use the same code (with minor updates) in the cloud hosted solution is such a huge benefit that it undermines anything else that any other solution has to offer. I'm excited about the sql stuff I learned over 10 years ago being of of great use to me in the coming months.

The last product I worked heavily on used a nosql database and it worked fine till you start tweak it just a little bit - split entities, convert data types or update ids. Most of the data access layer logic dealt with conversion between data coming in from the database and the guardrails to keep the data integrity in check while interacting with the application models. To me this is something so obviously solved years ago with a few lines of constraints. Moving over to sql was totally impossible. Learned my lesson, advocated hard for sql. Hoping for better outcomes.

show 1 reply