Databases Were Not Designed for This

72 points • by mooreds • last Friday at 11:41 PM • 67 comments • view on HN

Comments

Giving LLM agents direct, autonomous access to a real production databases with write access seems insane to me.

NO ONE, agent or human, should have direct write access to production databases outside of emergency break glass scenarios. This is why we have stored routines and API layers to pre-define what writes are allowed. The facts that agents CAN autonomously write to a database does not imply that they should.

For the point about query optimization, again your agents should not be issuing random queries against a production database. We have had the concept of separate analytics databases with different architectures to support exporatory queries for decades.

➕ show 6 replies

ripped_britches • today at 5:54 PM

I’m all-in on agents but this is a “you’re holding it wrong” situation.

If you want to give your agents a DB for their own work as a scratchpad or something that’s great. They can not only go to town, but also analyze their own work and iterate on it.

If you are talking about a production base, agents should not be hitting it directly under any circumstances. There needs to be an API layer with defined usage patterns, rate limits, etc.

This is basically the same as saying “databases weren’t designed for interns to run live inline migrations in prod”. Yeah of course they aren’t.

bloaf • today at 5:21 PM

> The API failed silently because the database connection pool was exhausted downstream.

I work with a team that does stuff like this, returning a 200 and a body containing "error: I didn't do what you said because _insert error here_"

The problem is that you returned OK instead of ERROR when things were not OK and there was an ERROR.

Its a design that smells of teams trying to hit some kind of internal metrics by slightly deceptive means.

➕ show 1 reply

iambateman • today at 4:44 PM

Giving an LLM write access is insane but I gave LLM’s read-only access to our database and it’s been a huge productivity win.

Executives who wouldn’t take the time to build a report are happy to ask an AI agent to do so.

➕ show 3 replies

Alex_L_Wood • today at 4:34 PM

This article has all the correct conclusions and solutions based on one assumption that doesn’t have any hold in reality - that someone would be insane enough to allow direct DB access to an AI agent.

➕ show 1 reply

aleda145 • today at 4:56 PM

I totally agree on investing in a sane data model upfront. So many production systems have schemas that only made sense to the engineer that created them. I would be delighted if I can read a schema and understand what a column means without having to dig through a bunch of migration PRs.

I recently encountered `is_as BOOL` in an important table. After way too much invested time we found out it meant "is active service". </DDL rant>

➕ show 4 replies

hasyimibhar • today at 6:12 PM

I'm not sure why you are giving your agents write access to query your OLTP database, let alone write to it. The pattern that I use at work is:

- Read access through OLAP, not OLTP. You just need to setup a near real-time replication between your OLTP and OLAP.

- Write access through API, just like your application. You can add fancy things like approval layer, e.g. you agent cannot "ban_user(id)", but it can "request_to_ban_user(id)", and the action only happens once you approve it.

sgarland • today at 6:55 PM

The article describes idempotency keys and then completely misses making them the PK. The example is already using UUIDv4 as a PK, so they’re clearly not optimizing for performance. If you’re using the first 32 characters of a SHA256 hash, congratulations, store it as BINARY(8) / BYTEA - it’s even half the size of an encoded UUID, to boot.

Also, the DB will most certainly not silently ignore a unique constraint violation: it will send an error back. EDIT: unless you’re using INSERT OR IGNORE, of course.

lateforwork • today at 4:45 PM

There are two broad types of databases: operational and analytical.

Operational databases store transactions and support day-to-day application workflows.

For analysis, data is often copied into separate analytical databases (data warehouses), which are structured for efficient querying and large-scale data processing. These systems are designed to handle complex, random queries and heavy workloads.

LLM agents are the best way to analyze data stored in these databases. This is the future.

➕ show 1 reply

zarzavat • today at 6:06 PM

I don't understand the premise. Who is letting "agents" run arbitrary SQL against their database without human review?

Before redesigning your database, consider seeing a psychiatrist.

PunchyHamster • today at 5:42 PM

> Never let an agent hard-delete anything. Use soft deletes as a baseline for any table an agent can write to

How you even enforce it ?

And why you are even giving agent access to live DB in the first place ?

➕ show 1 reply

red_admiral • today at 6:47 PM

At one place, the saying was that databases can handle everything except a class of sophomores learning how to use databases.

brunkerhart • today at 6:15 PM

Databricks introduced Lakebase exactly for this purpose. Lakebase supports branching that makes querying, schema evolution and writes cheap. Each agent can run its own branch without affecting production data.

ak217 • today at 5:34 PM

> Connections are Brief

This doesn't make sense, in the context of the author's chosen example (postgres). Postgres connections are very heavy and there is a huge performance penalty for cycling them quickly, and a whole range of silly workarounds for this fact (pgbouncer). Maybe the author meant to say that sessions are brief.

shmeeny • today at 5:34 PM

I think the spirit of this post has merit, but the premise is flawed. ORMs have been causing this same class of problem for decades. Furthermore, It's not at all uncommon for humans to create different queries for the same result and for them to follow different review paths for the same underlying database.

➕ show 1 reply

setr • today at 6:21 PM

> The contract goes something like this: the caller is a human-authored application, running deterministic code, issuing predictable queries, reviewed by a developer before deployment. Writes are intentional. Connections are brief. When something goes wrong, a human notices. The database can be dumb and fast because the application layer is smart and careful.

This assumption is that of a non-DBA who happened to get a hold of a database.

When you have sufficient users, your expected set of queries is a complete shit show. Some will be efficient, many will be poorly tested and psychotic, and indistinguishable from a non-deterministic LLM.

Also who said humans can’t query the database directly in prod? If not direct sql access, business users have the next best thing with custom reports and such. And they will very much ask for ridiculous amounts of computation to answer trivial questions.

It was a foundational assumption of SQL that business users would directly access the database and write their own queries.

It’s why row level access and permissions exist. Use them

bijowo1676 • today at 5:46 PM

think of a flat database table as a projection (of all previous SQL queries), should you give access to it to agents?

probably not, maybe only for analytical (OLAP) purposes in read-only mode.

for transactional OLTP loads, it is better to use Kafka style durable queues, have agents create a change record to mutate the state, but not the projection itself, which could be recomputed at arbitrary point in time via time-travel mechanism, could be branched out into different versions, etc

pilgrim0 • today at 4:29 PM

Who the hell let agents directly use a database? Even humans don’t get this privilege. So, of all things, we forgot how to write APIs now? The article suggests creating a role for the agent directly in the database. What is wrong with you people? The very title of the article defeats its own purpose. They are not designed for this so don’t let them be used like this, ffs.

➕ show 1 reply

mdavid626 • today at 6:41 PM

Jesus Christ. All this is true only if you let them.

It’s good idea to be defensive, design the system in a way that it can “fix” itself.

But for love of god, don’t let an LLM do everything it wants.

diavelguru • today at 5:41 PM

Agree across the board.

croisillon • today at 5:15 PM

i wonder if that guy get requests per email from his "talks" section

lowsong • today at 6:33 PM

> None of this requires new technology. It requires treating the database as a defensive layer that assumes the caller might be wrong, might retry, and might not be watching the results.

This is one of those takes that is so close to understanding the problem, and then drawing an insane conclusion.

The problem is that AI agents and the code they output is untrustworthy, buggy, insecure, and lacking in any of the standards the industry has developed over the last 30 years. The solution to this is "don't use AI agents", not "change the rest of the stack to accommodate garbage".

stavros • today at 6:09 PM

I'm exasperated whenever I read articles like this. Anyone who underscores the difference between humans and agents by saying "[agents] write based on their current understanding of the task, which may be wrong" is clearly working with a different species of human than the one I've worked with.

efficax • today at 5:08 PM

Why are you connecting your agent to a database with write access? Are you out of your mind.

alt Hacker News

Databases Were Not Designed for This

Comments