Very interesting, can you give more info on how this could be used for instance in my IoT case where I want to keep the last 3 months (say) of data in Postgres, and dump old data in parquet/iceberg on S3, and be able to do analytical queries on the past data? Would that be hard to do?
And how does the real-time update work? Could I make it so that my latest data is incrementally sync'd on S3 (eg "the last 3-months block" is incrementally updated efficiently each time there is new data) ?
Do you have example code / setup for this?
You can store all data in ClickHouse (on S3 or on local storage); there is no need to separate historical and real-time data.
To insert data into ClickHouse, you use the INSERT query to insert data as frequently as you'd like.
Alternatively, you can set up continuous replication from Postgres to ClickHouse, which is available in ClickHouse Cloud.