logoalt Hacker News

neeleshs11/07/20241 replyview on HN

Congratulations! I was looking and pg_analytics from ParadeDB hoping this use case would be solved (the dump from pg to parquet part), but it doesnt yet do it.

How does it handle updates?


Replies

exAspArk11/07/2024

Thank you!

The pg_analytics Postgres extension partially supports different file formats. We bet big on Iceberg open table format, which uses Parquet data files under the hood.

Our initial approach is to do periodic full table resyncing. The next step is to support incremental Iceberg operations like updates. This will involve creating a new "diff" Parquet file and using the Iceberg metadata to point to the new file version that changes some rows. Later this will enable time travel queries, schema evolution, etc.

show 1 reply