logoalt Hacker News

cjonastoday at 1:09 PM4 repliesview on HN

We just create mini data "ponds" on the fly by copying tenant isolated gold tier data to parquet in s3. The users/agent queries are executed with duckdb. We run this process when the user start a session and generate an STS token scoped to their tenant bucket path. Its extremely simple and works well (at least with our data volumes).


Replies

mattaitkentoday at 2:17 PM

This is cool. I think for our use case this wouldn’t work. We’re dealing with billions of rows for some tenants.

We’re about to introduce alerts where users can write their own TRQL queries and then define alerts from them. Which requires evaluating them regularly so effectively the data needs to be continuously up to date.

show 2 replies
Waterluviantoday at 1:20 PM

Is that why it’s called DuckDb? Because data ponds?

show 3 replies
otterleytoday at 2:18 PM

How large are these data volumes? How long does it take to prepare the data when a customer request comes in?

show 1 reply
boundlessdreamztoday at 1:26 PM

How do you copy all the relevant data? Doesn't this create unnecessary load on your source DB?

show 1 reply