Remember that many people are heavily are happy-path biased. They see a good result once and say "that's it, ship it!"
I'm sure they QA'd it, but QA was probably "does this give me good results" (almost certainly 'yes' with an LLM), not "does this consistently not give me bad results".
Agreed, I just read this paper by AWS' Ahmed El-Deeb
https://dl.acm.org/doi/epdf/10.1145/3780063.3780066 (PDF loads slow....)
> almost certainly 'yes' with an LLM
LLMs can handle search because search is intentionally garbage now and because they can absorb that into their training set.
Asking highly specific questions about NYC governance, which can change daily, is almost certainly 'not' going to give you good results with an LLM. The technology is not well suited to this particular problem.
Meanwhile if an LLM actually did give you good results it's an indication that the city is so bad at publishing information that citizens cannot rightfully discover it on their own. This is a fundamental problem and should be solved instead of layering a $600k barely working "chat bot" on top the mess.