We just create mini data "ponds" on the fly by copying tenant isolated gold tier data to parquet in s3. The users/agent queries are executed with duckdb. We run this process when the user start a session and generate an STS token scoped to their tenant bucket path. Its extremely simple and works well (at least with our data volumes).
How large are these data volumes? How long does it take to prepare the data when a customer request comes in?
How do you copy all the relevant data? Doesn't this create unnecessary load on your source DB?
This is cool. I think for our use case this wouldn’t work. We’re dealing with billions of rows for some tenants.
We’re about to introduce alerts where users can write their own TRQL queries and then define alerts from them. Which requires evaluating them regularly so effectively the data needs to be continuously up to date.