Using duckdb and apache iceberg means that you can run read replicas without any operational burden....

maxmcd • 11/07/2024 • 1 reply • view on HN

Using duckdb and apache iceberg means that you can run read replicas without any operational burden. Clickhouse is amazing, but they do not allow you to mount dumb read replicas to object storage (yet).

I can imagine this product is a very elegant solution for many types of companies/teams/workloads.

Replies

zX41ZdbW • 11/07/2024

You can mount read replicas on object storage in ClickHouse.

Example:

    CREATE DATABASE test;
    USE test;

    CREATE TABLE hackernews_history UUID '66491946-56e3-4790-a112-d2dc3963e68a'
    (
        `update_time` DateTime DEFAULT now(),
        `id` UInt32,
        `deleted` UInt8,
        `type` Enum8('story' = 1, 'comment' = 2, 'poll' = 3, 'pollopt' = 4, 'job' = 5),
        `by` LowCardinality(String),
        `time` DateTime,
        `text` String,
        `dead` UInt8,
        `parent` UInt32,
        `poll` UInt32,
        `kids` Array(UInt32),
        `url` String,
        `score` Int32,
        `title` String,
        `parts` Array(UInt32),
        `descendants` Int32
    )
    ENGINE = ReplacingMergeTree(update_time)
    ORDER BY id
    SETTINGS disk = disk(readonly = true, type = 's3_plain_rewritable', endpoint = 'https://clicklake-test-2.s3.eu-central-1.amazonaws.com/', use_environment_credentials = false);

And you can try it right now.

Install ClickHouse:

    curl https://clickhouse.com/ | sh
    ./clickhouse local

Run the query above to attach the table.

The table is updated in real time. For example, here is your comment:

    :) SELECT * FROM hackernews_history WHERE text LIKE '%Clickhouse is amazing%' ORDER BY update_time \G

    Row 1:
    ──────
    update_time: 2024-04-06 16:35:28
    id:          39785472
    deleted:     0
    type:        comment
    by:          mightybyte
    time:        2024-03-21 22:59:20
    text:        I&#x27;ll second this.  Clickhouse is amazing.  I was actually using it today to query some CSV files.  I had to refresh my memory on the syntax so if anyone is interested:<p><pre><code>  clickhouse local -q &quot;SELECT foo, sum(bar) FROM file(&#x27;foobar.csv&#x27;, CSV) GROUP BY foo FORMAT Pretty&quot;
    </code></pre>
    Way easier than opening in Excel and creating a pivot table which was my previous workflow.<p>Here&#x27;s a list of the different input and output formats that it supports.<p><a href="https:&#x2F;&#x2F;clickhouse.com&#x2F;docs&#x2F;en&#x2F;interfaces&#x2F;formats" rel="nofollow">https:&#x2F;&#x2F;clickhouse.com&#x2F;docs&#x2F;en&#x2F;interfaces&#x2F;formats</a>
    dead:        0
    parent:      39784942
    poll:        0
    kids:        [39788575]
    url:         
    score:       0
    title:       
    parts:       []
    descendants: 0

    Row 2:
    ──────
    update_time: 2024-04-06 18:07:34
    id:          31334599
    deleted:     0
    type:        comment
    by:          richieartoul
    time:        2022-05-11 00:54:31
    text:        Not really. Clickhouse is amazing, but if you want to run it at massive scale you’ll have to invest a lot into sharding and clustering and all that. Druid is more distributed by default, but doesn’t support as sophisticated of queries as Clickhouse does.<p>Neither Clickhouse nor Druid can hold a candle to what Snowflake can do in terms of query capabilities, as well as the flexibility and richness of their product.<p>That’s just scratching the surface. They’re completely different product categories IMO, although they have a lot of technical &#x2F; architectural overlap depending on how much you squint.<p>Devil is in the details basically.
    dead:        0
    parent:      31334527
    poll:        0
    kids:        [31334736]
    url:         
    score:       0
    title:       
    parts:       []
    descendants: 0

    Row 3:
    ──────
    update_time: 2024-11-07 22:29:09
    id:          42081672
    deleted:     0
    type:        comment
    by:          maxmcd
    time:        2024-11-07 22:13:12
    text:        Using duckdb and apache iceberg means that you can run read replicas without any operational burden. Clickhouse is amazing, but they do not allow you to mount dumb read replicas to object storage (yet).<p>I can imagine this product is a very elegant solution for many types of companies&#x2F;teams&#x2F;workloads.
    dead:        0
    parent:      42080385
    poll:        0
    kids:        []
    url:         
    score:       0
    title:       
    parts:       []
    descendants: 0

    3 rows in set. Elapsed: 3.981 sec. Processed 42.27 million rows, 14.45 GB (10.62 million rows/s., 3.63 GB/s.)
    Peak memory usage: 579.26 MiB.

➕ show 5 replies

alt Hacker News

Replies