logoalt Hacker News

luhntoday at 1:19 AM11 repliesview on HN

> By Jan 2024, our largest table had roughly 100 million rows.

I did a double take at this. At the onset of the article, the fact they're using a distributed database and the mention of a "mid 6 figure" DB bill made me assume they have some obscenely large database that's far beyond what a single node could do. They don't detail the Postgres setup that replaced it, so I assume it's a pretty standard single primary and a 100 million row table is well within the abilities of that—I have a 150 million row table happily plugging along on a 2vCPU+16GB instance. Apples and oranges, perhaps, but people shouldn't underestimate what a single modern server can do.


Replies

thehappyfellowtoday at 9:56 AM

It’s incredible how much Postgres can handle.

At $WORK, we write ~100M rows per day and keep years of history, all in a single database. Sure, the box is big, but I have beautiful transactional workloads and no distributed systems to worry about!

show 2 replies
sgarlandtoday at 1:07 PM

Agreed. Devs usually do a double take when I tell them that their table with 100K rows is not in fact big, or even medium. Everyone’s experiences are different, of course, but to me, big is somewhere in the high hundreds of millions range. After a billion it doesn’t really matter; the difference between 5 billion and 1 billion isn’t important, because it’s exceedingly unlikely that a. Your working set is that large b. That your server could possibly cope with all of it at once. I hope you have partitions.

hliyantoday at 2:02 AM

Call me old fashioned, but when records start reaching the 100 million range, it's usually an indication that either your dataset is too wide (consider sharding) or too deep (consider time based archival) to fit into a monolithic schema. For context, I've dealt with multiple systems that generate this volume of data between 2003 - 2013 (mostly capital markets, but also some govt/compliance work) with databases and hardware from that era, and we rarely had an issue that could not be solved by either query optimization, caching, sharding or archival, usually in that order.

Secondly, we did most of these things using SQL, Bash scripts, cron jobs and some I/O logic built directly into the application code. They were robust enough to handle some extremely mission critical systems (a failure could bring down a US primary market and if it's bad enough, you hear it on the news).

show 2 replies
icedchaitoday at 1:38 AM

You don't even need to be that "modern." Back in 2010 I was working on a MySQL 5.x system with about 300 million rows on a dual Xeon box with 16 gigs RAM and a few hundred gigs of RAID 10. This was before SSDs were common.

The largest table was over 100 million rows. Some migrations were painful, however. At that time, some of them would lock the whole table and we'd need to run them overnight. Fortunately, this was for an internal app so we could do that.

show 1 reply
throwaway7783today at 2:18 AM

Yeah, we have 300m+ rows in a table as well. It's partitioned by time and chugs along with no issues. Granted It's a 30 vcpu, 100gb ram machine, but it hosts billions of rows in aggregate

show 1 reply
williamdclttoday at 9:12 AM

Yeah, 100mil is really not that much. I worked on a 10B rows table on an rds r6g.4xl, and Postgres handled it fine, even with 20+ indexes. Really not ideal and I'd rather have fewer indexes and sharding the table, but postgres dealt with it.

thomasfromcdnjstoday at 1:32 AM

Does mid six figure mean ~$500k?

That sounds insane for a crud app with one million users.

What am I missing?

show 3 replies
h1fratoday at 10:44 AM

Also screamed in my head, I have way more rows than that in a Postgres right now and paying less than $500!

ringerylesstoday at 6:21 AM

OTOH they are admittedly using an ORM (Prisma, known for its weight)

show 1 reply
sethammonstoday at 10:11 AM

I missed that number, but caught they migrated all data in 15 minutes and I blinked: "wait, how little data are we talking about for how much money!?"

casper14today at 1:24 AM

Nice! What optimizations have you put in llace yo support 150 mil? Just some indexing or other fancy stuff?

show 2 replies