DuckDB Internals: Why Is DuckDB Fast? (Part 1)

80 points • by marklit • last Tuesday at 11:07 AM • 39 comments • view on HN

Comments

If you're reading this and curious: consider writing a duckdb community extension* or contributing to an existing one*

duckdb is becoming a kind of data superglue between a lot of data ecosystems (GIS, observability, analytics, lakehouses, object storage, etc) that don't talk to each other typically, and it's worth checking out in 2026.

* https://github.com/duckdb/extension-template * https://duckdb.org/community_extensions/

➕ show 1 reply

0xferruccio • today at 5:35 AM

DuckDB is amazing for any sort of fast data analysis when the data is small enough that it can fit on your laptop

Recently at work I've been using it to analyse the Claude code sessions of every engineer at our company (that we upload to S3) and it's been extremely helpful to help us find gaps in devex and have clear metrics to back up the impact of fixing them

Another thing it's been really useful for has been getting metrics on Claude skills usage and then dive into use-cases by looking at the transcripts

Other engineers that had never touched DuckDB were so impressed with how easy it is for AI agents to write queries on our dataset

anitil • today at 4:48 AM

DuckDb makes so much of my life easier, though I've never used it for large problems. The ability to run `select * from 'data.json'` is just lovely. The fact that it's also a powerhouse is so impressive, I'd usually expect a project to be good at small problems (like mine) xor large problems, but not both

➕ show 1 reply

Panzerschrek • today at 5:49 AM

If DuckDB is so fast and has no data transfer overheads, does it need all this typical SQL machinery with filtering and joining via SELECT queries? Wouldn't it be simpler and faster to return all data to the caller code (all table rows, but only requested columns) and let it perform all other necessary data processing logic?

➕ show 1 reply

snissn • today at 5:56 AM

I'm just curious - is duckdb too slow for people? This benchmark from clickhouse shows it being fairly slow compared to some options: https://jsonbench.com/

steve_adams_86 • today at 4:37 AM

> DuckDB has received widespread adoption because it's just so damn easy to use.

This was a major factor in my initial adoption. Since then it has stuck because it’s also absurdly capable, versatile, and fast.

If it wasn’t so easy to use I suspect I wouldn’t have adopted it when I did. The ergonomics are crazy. It still impresses me regularly.

➕ show 1 reply

pknerd • today at 5:48 AM

FTA:

> ..In-process means there's no server. You don't connect to DuckDB; you load it as a library inside your program, the same way you'd load NumPy or Polars

Does it mean it can perform all statistical computations as well if I want to use for algo trading?

jdw64 • today at 5:14 AM

The data scientists I work with use this. Why do they use it? I don't really know much about it, but I've noticed they use it quite often. I mainly use MySQL or PostgreSQL. What are the advantages of DuckDB? It seems like they usually use it as an alternative to Pandas.

➕ show 3 replies

holografix • today at 5:48 AM

Why is DuckDB so popular when one can use Python + Pandas?

Better perf + SQL is that mostly it?

➕ show 4 replies

f311a • today at 6:04 AM

I wish this article was not LLM written

codingbear • today at 5:24 AM

duckdb is so nice coupled with claude code. It extensive file support and some very interesting decisions on local caching data (like from S3 or snowflake) makes it easy to slice and dice almost any kind of tabular data.

➕ show 1 reply

thefourthchime • today at 4:42 AM

I’m a huge fan, I’ve been wanting to know into the internals. Look forward to digging in.

pknerd • today at 5:52 AM

umm can we say it can replace SQLite?

➕ show 2 replies

alt Hacker News

DuckDB Internals: Why Is DuckDB Fast? (Part 1)

Comments