I love SQLite and this is in no way I'm making a point devaluing SQLite, Author's method is excellent approach to get analytical speed out of SQLite. But I am loving DuckDB for similar analytical workloads as it is built for such tasks. DuckDB also reads from single file, like SQLite and DuckDB process large data sets at extreme speeds. I work on my macbook m2 and I have been dealing with about 20 million records and it works fast, very fast.
Loading data into DuckDB is super easy, I was surprised :
SELECT avg(sale_price), count(DISTINCT customer_id) FROM '/my-data-lake/sales/2024/*.json';
and you can also load into a JSON type column and can use postgres type syntax col->>'$.key'
Whoa. Is that first query building an index of random filesystem json files on the fly?
can we all agree to never store datasets uncompressed. duckdb supports reading many compression formats
duckdb is super fast for analytic tasks, especially when u use it with visual eda tool like pygwalker. it allows u handles millions of data visuals and eda in seconds.
but i would say, comparing duckdb and sqlite is a little bit unfair, i would still use sqlite to build system in most of cases, but duckdb only for analytic. you can hardly make a smooth deployment if you apps contains duckdb on a lot of platform