It's been a lifesaver for some analysis I had to do on 70GB of Cloudflare logs.
So is DuckDb a database or a cli tool to query all sorts of file format using SQL statements? I've used it as a CLI tool, somehow don't understand the comparison to a database, which stores your data reliably, besides responding to your SQL queries.
I benchmarked DuckDB 1.5.2 with the latest Java JDBC driver which now supports user defined functions. This allows very fast modifications https://sqg.dev/blog/java-duckdb-benchmark/
Data engineer here: I use this all the time. It's amazing. For most of the data the sizes we often deal with it's perfect.
Did they finally enable full SIMD or keep insisting its okay not to have it?
I use duckdb often too, but the way it is being hyped in these comments makes me feel like I'm missing out on some insane usecase.
I basically use it to load csv, jsonl, parquet etc etc formats and do arbitrary transformations. Are people doing something else with it?
duckdb is a generational technology innovation. insanely good ergonomics, great performance, it's awesome.
I found it unusable due to out of memory errors with a billion row 8 column dataset.
It needs manual tuning to avoid those errors and I couldn’t find the right incantation, nor should I need to - memory management is the job of the db, not me. Far too flakey for any production usage.
DuckDB also runs in Excel, by the way, via the free xlwings Lite add-in that you can install from the add-in store. It’s using the Python package and allows to write scripts, custom functions, as well as use a Jupyter-like notebook workflow.