Also, for querying both recent and historical data together, you wouldn't need to modify this t...

jconline543 • 11/08/2024 • 0 replies • view on HN

Also, for querying both recent and historical data together, you wouldn't need to modify this tool at all. You could just add a separate periodic job (e.g. hourly/daily) that copies recent data to S3:

sqlCopyCOPY (SELECT * FROM iot_data WHERE timestamp > current_date - interval '90 days')

TO 's3://bucket/recent/iot_data.parquet' (FORMAT 'parquet')

Then query everything together in DuckDB:

sqlCopySELECT * FROM read_parquet([

  's3://bucket/year=*/month=*/iot_data_\*.parquet',  -- archived data
  's3://bucket/recent/iot_data.parquet'             -- recent data

])

Much simpler than implementing real-time sync, and you still get a unified view of all your data for analysis (just with a small delay on recent data).

alt Hacker News