> None of the systems I mentioned existed at the time the article was published
So Hadoop was doing distributed compute wrong but now they have it figured out?
The point is that there is enormous overhead and complexity in going it in any kind of system. And your computer has a lot of power you probably aren’t maxing out.
> which is a very speedy CLI SQL thingy that reads and writes data in all sorts of formats.
Do you know about SQLite?
Yeah im a big fan of SQLite :). But at analytical workloads like aggregating every row, DuckDB will outperform SQLite by a wide margin. SQLite is great stuff but it’s not a very good data Swiss Army knife because it’s very focused on a single core competency: embeddable OLTP with a simple codebase. DuckDB can read/write many more formats from local disk or via a variety of network protocols. DuckDB also embeds SQLite so you can use it with SQLite DBs as inputs or outputs.
> they were doing distributed compute wrong but now they have it figured out?
Like anything the future is here but it’s unevenly distributed. Frank McSherry, the first author of “Scalability but at what COST” wrote Timely Dataflow as his answer to that question. ByteWax is based on Timely as is Materialize. Stuff is still complex but these more modern systems with performance as their goal are orders of magnitude better than the Hadoop era Java stuff.