If you just need the Postgres -> S3 archival pattern you described, I built a simpler focused tool: pg-archiver (https://github.com/johnonline35/pg-archiver)
It:
- Auto-archives old Postgres data to Parquet files on S3
- Keeps recent data (default 90 days) in Postgres for fast viz queries
- Uses year/month partitioning in S3 for basic analytical queries
- Configures with just PG connection string and S3 bucket
Currently batch-only archival (no real-time sync yet). Much lighter than running a full analytical DB if you mainly need timeseries visualization with occasional historical analysis.
Let me know if you try it out!
Really cool! I'll take a look!
- can you then easily query it with duckdb / clickhouse / something else? What do you use yourself? do you have some tutorial / toy example to check?
- would it be complicated to have the real-time data be also stored somehow on S3 so it would be "transparent" to do query on historical data which includes day data?
- what typical "batch data" size makes sense, I guess doing "day batches" might be a bit small and will incurr too many "read" operations (if I have moderate amount of day data), rather than "week batches"? but then the "timelag" increases?