logoalt Hacker News

What if database branching was easy?

24 pointsby tee-es-geelast Saturday at 6:29 AM13 commentsview on HN

Comments

sgarlandtoday at 1:44 PM

> Imagine you need to add an index to a table with a few million rows. On a seeded database with 200 rows, the migration runs in milliseconds. Obviously. But on a branch with realistic data, it takes 40 seconds and needs CREATE INDEX CONCURRENTLY to avoid locking the table. The branch is isolated, so locking there isn't the issue — the point is that the rehearsal shows the production migration would need CONCURRENTLY.

A few million rows should take at most, on the most awful networked storage available, maybe 10 seconds. I just built an index locally on 10,000,000 rows in 4 seconds. Moreover, though, there are vanishingly few cases where you wouldn't want to use CONCURRENTLY in prod - you shouldn't need to run a test to tell you that.

IMO branching can be a cool feature, but the use I keep seeing touted (indexes) doesn't seem like a good one for it. You should have a pretty good idea how an index is going to behave before you build it, just from understanding the RDBMS. There are also tools like hypopg [0], which are also available on cloud providers.

A better example would be showing testing a large schema change, like normalizing a JSON blob into proper columns or something, where you need to validate performance before committing to it.

0: https://github.com/HypoPG/hypopg

mwexlertoday at 2:06 PM

I thought this was a repost but I was thinking of dolt which has similar capabilities but a different approach. As in https://news.ycombinator.com/item?id=38109880

sastraxitoday at 1:04 PM

I’ve done experiments using BTRFS and ZFS for local Postgres copy-on-write. You don’t need anything but vanilla pg and a supported file system to do it anymore; just clone the database using a template and a newish version of Postgres.

Looking at Xata’s technical deep dive, the site claims that we need an additional Postgres instance per replica and proposes a network file system to work around that. But I don’t really understand why that’s needed. Can someone explain to me my misunderstanding here?

show 1 reply
comrade1234today at 12:48 PM

I was on a big team working on a giant oracle database over 25-years ago. I dont remember the term but each developer had their own playground of the giant database that wasn't affected by anyone else. The DB admin would set it up for each developer in just a few minutes so it definitely wasn't a copy. Then when a developer needed to reset and go back to the original db again it just took a few minutes. I just don't remember what it's called but I think Postgres has had it now for a few years.

show 2 replies
Nihilartikeltoday at 12:55 PM

This kind of magic is the reason that I'm very itchy to be able to line up real work on Datomic or XTDB someday.

show 1 reply
theaniketmauryatoday at 12:57 PM

i was using neon and they had some similar feature but now using planetscale. would be curious to know how you all are doing it?

show 1 reply
e7h4nztoday at 1:06 PM

[dead]