logoalt Hacker News

jconline54311/08/20241 replyview on HN

If you just need the Postgres -> S3 archival pattern you described, I built a simpler focused tool: pg-archiver (https://github.com/johnonline35/pg-archiver)

It:

- Auto-archives old Postgres data to Parquet files on S3

- Keeps recent data (default 90 days) in Postgres for fast viz queries

- Uses year/month partitioning in S3 for basic analytical queries

- Configures with just PG connection string and S3 bucket

Currently batch-only archival (no real-time sync yet). Much lighter than running a full analytical DB if you mainly need timeseries visualization with occasional historical analysis.

Let me know if you try it out!


Replies

oulipo11/08/2024

Really cool! I'll take a look!

- can you then easily query it with duckdb / clickhouse / something else? What do you use yourself? do you have some tutorial / toy example to check?

- would it be complicated to have the real-time data be also stored somehow on S3 so it would be "transparent" to do query on historical data which includes day data?

- what typical "batch data" size makes sense, I guess doing "day batches" might be a bit small and will incurr too many "read" operations (if I have moderate amount of day data), rather than "week batches"? but then the "timelag" increases?

show 1 reply