logoalt Hacker News

Postgres transactions are a distributed systems superpower

125 pointsby KraftyOneyesterday at 6:38 PM59 commentsview on HN

Comments

mrkeenyesterday at 9:14 PM

I walked away from a job interview a few years ago on this point.

One of the technical questions was "if you have a db and a message queue, how do you get your update to alter both or neither (i.e. transactionally)"?

I thought about it for a couple of minutes, then came back with something like "I can't, and you can't either." Then I proposed the usual spiel about using a replicated-state-machine/write-ahead-log/event-sourcing (whatever it might be called at the time) and leaning into eventual consistency as the only practical solution.

He asked if I'd heard about the outbox pattern, so I let him describe it. Sure enough it sounded like this article. The secret to transacting across the database D and the message queue Q:

  (D,Q)
is to split D into two parts (the State and the Outbox), transact across those instead

  (S,O)    Q
and then just pretend that you have a transaction across D and Q.
show 8 replies
munk-ayesterday at 8:58 PM

We've leveraged the atomicity of transactions with a fail-safe approach for external service interactions for client email sending. This could certainly be done with a formal queue though it'd operate very similarly and achieve the same guarantees as we have today (and was built when we were too small to justify such an infra spend). Internally we have jobs that execute complex logic to transform data from a pending state to a computed state which lean on the DB's atomicity to guarantee that data is successfully transitions and those tasks are all incredibly resilient - but when a secondary persistence store is involved transactional guarantees need to be compromised in some manner. In our email sending example we have the opinion that it is more important to guarantee a client receives all notifications compared to a notification being guaranteed to be sent precisely once so our mechanism in sending is to confirm email sending was successful and then close a transaction that removes that message from the pending list.

There will always be a window for potential loss due to solar flares/whatever but the key in designing a system like this is to make sure you're aware of how the system can fail, accept that outcome and then work to, as much as possible, shrink the distance in cycles/logic between each persistence committal. Logic should be front-loaded to do as much prep work as possible before any irreversible actions happen and then those irreversible actions should be ordered to your preference and dispatched as quickly and cheaply as possible in a safe manner.

jdw64yesterday at 8:40 PM

So my understanding is that they're aligning the workflow progression unit and the database commit unit on a one-to-one basis. In other words, each step in the workflow becomes a database commit unit. That's why the outbox pattern gets simplified. But in exchange, the database itself becomes tightly coupled to the workflow, which will make it architecturally difficult to separate later on. Although, to be fair, I almost never actually need to separate the database anyway.

In most services, I often swap out the message broker or the workflow engine, but the database almost always stays the same.

I'm not sure if I've understood this correctly.

show 1 reply
cloudie78yesterday at 8:00 PM

Congratulations, you discovered a mutex.

Is it really a distributed system or just a bunch of services with a central database?

show 3 replies
Crowberryyesterday at 9:11 PM

We’ve got an in-house pubsub solution that lives in the main applications database, so pretty much exactly as described in the article. And the atomicity it allows is indeed really nice!

zadikiantoday at 12:30 AM

The part about durable workflows is technically correct, but it's focusing on different things than what I've ever run into in practice. Any mildly complex system will have side effects outside your DB, then you want idempotency. If you have no side effects, you probably don't need a durable workflow in the first place? Maybe there's a more concrete example.

I have rolled my own little durable workflows in Postgres before, in fact before I even knew durable workflows were a thing with solutions like Temporal. That's fine for many cases where you aren't doing enough steps for it to be tedious, and/or you want permanent records. Would do it again, but not for atomicity reasons.

Other comments have already discussed the issue with the outbox UDF, your external system has to poll and retry either way. It works though. Maybe I'm misunderstanding this?

show 1 reply
aynycyesterday at 9:25 PM

OK. I've read it a few times and still don't understand. Where is the distributed part? You store data in a single transaction into postgres. What/who is notifying the message queue?

show 2 replies
cadamsdotcomtoday at 1:05 AM

The coolest thing about using Postgres for everything is when the database works everything works and when the database goes down it all goes down, so you get to fix nothing most days then everything all at once.

show 2 replies
zyngaroyesterday at 10:57 PM

The article is ridden with misconception. Have you guys ever heard of the CAP theorem ? Disturbed system suck let's implement a non distributed one. The title is also misleading: Postgres transactions are not distributed.

hopppyesterday at 9:36 PM

Just start writing stored functions already.

evilturnipyesterday at 8:12 PM

Can you use postgres as a state store for a distributed application?

It seems this article is trending toward that view: If you can maintain transactional consistency along with application workflow state, then would this generalize to maintaining distributed application state in general?

The follow-up would be: Would this be preferable to Valkey/Redis?

show 3 replies
bsaulyesterday at 8:06 PM

i don't understand the last point of UDF. Either you need the state to be updated atomically across different systems or you don't. But writing a row in a system in order to update the second one at any random time in the future isn't really much different from enqueuing a job in queue.

show 2 replies
limayesterday at 10:08 PM

[dead]