> This is the part that doesn’t demo well. ETL pipelines feeding into BigQuery from every operational system: Salesforce, Zendesk, and a dozen other internal tools. dbt transformations that normalize and document the data. Column-level descriptions for every table in the warehouse, because an AI agent that doesn’t know what a column means will write SQL that looks right and returns wrong numbers.
I'm glad they called this out. For the first half of this, I kept thinking: "Either your answers are confidently wrong or you've done a ton of prep work to let your AIs be effective BI analysts." Sounds like it's the latter, and they're well aware of it!
Shameless plug: My org does this, and we deactived our Slack server to dogfood
By building Hasura [0], we already had the ability to generate data catalogs + metadata layer from DB's + API's so the foundational infra was there
> When a question touches restricted data — student PII, sensitive HR information — the agent doesn’t just refuse. It explains what it can’t access and proposes a safe reformulation. "I can’t show individual student names, but here’s the same analysis using anonymized IDs."
This part is scary. It implies that if I'm in a department that shouldn't have access to this data, the AI will still run the query for me and then do some post-processing to "anonymize" the data. This isn't how security is supposed to work... did we learn nothing from SQL injection?
Just going to say it... no mention of handling the security aspects of this. Scary.
This is cool, I should say, but I would be really worried about the security aspects. Prompt injection here could be really painful.
We tried something similar at a previous company — ended up with 3 different bots all answering slightly differently depending on which doc chunk they hit. The consistency problem is real.
Curious how you handle updates. Like if someone edits the source doc, does the bot just start returning different answers or is there a review step?
We're a 30-person ed-tech company. I built a Slack bot that connects our data warehouse, 250k Google Drive files, support tickets, and codebase so anyone on the team can ask it a question and get a sourced answer back. The bot took two and a half weeks to build; the data infrastructure under it took two years. Wrote up the architecture, where trust breaks down, and what I'd build first if starting over.
She opens Slack, types a question to our internal agent: “Give me the names of all employees who have recently complain about my leadership”
Prior having such a product it was such a chore for her to track down all the people who may have objected (dispassionately or otherwise) to my plans, strategies, objectives, etc.
is it just me or was the scrollbar purposefully hidden on this site? in chrome on windows, i found it very jarring and user-hostile to NOT know how far along i was in reading the article.
i make a judgement call early on: is this worth my time? my whole article calculation algo was thrown off by this.
do not like.
[dead]
data engineering is all you need.
everything else is smoke
all ai applications are smoke and will be obsolete in a year
do not be deceived
> The data infrastructure underneath it took two years.
yep, that's what Definite is for: https://www.definite.app/
All the data infra (datalake + ELT/ETL + dashboards) you need in 5 minutes.
Off meta: Are we tired of "one single product that solves everything" that every single AI product has became?
grep didn't try to also do what awk does, and jq and curl did exactly what they needed to do without wanting to become an OS (looking at you emacs), can we have that in the AI world? I hope/think we will, in a few years, once this century's iteration of FSF catches up.