logoalt Hacker News

sgarlandyesterday at 11:39 PM6 repliesview on HN

It is forever enraging to me that ORMs turn SELECT * into each individual column, mostly because people then post the whole thing and it’s obnoxiously large.

Similarly maddening, the appalling lack of normalization that is simply taken for granted. “It’s faster, bro.” No, no, it is not. Especially not at the hundreds of millions or billions of rows scale. If you store something low-cardinality like a status column, with an average length of perhaps 7 characters, that’s 8 bytes (1 byte overhead assumed, but it could be 2). Multiply that by 2 billion rows, and you’re wasting 16 GB. Disk is cheap, but a. Memory isn’t b. Don’t be lazy. There’s a right way to use an RDBMS, and a wrong way. If you want a KV store, use a damn KV store.

Finally, I’d be remiss if I failed to point out that Prisma is an unbelievably immature organization who launched without the ability to do JOINS [0]. They are forever dead to me for that. This isn’t “move fast and break things,” it’s “move fast despite having zero clue what we’re doing but convince JS devs that we do.”

[0]: https://github.com/prisma/prisma/discussions/19748


Replies

bastawhizyesterday at 11:50 PM

> ORMs turn SELECT * into each individual column

This is a safety feature. If my code expects columns A, B, and C, but the migration to add C hasn't run yet and I'm doing something that would otherwise `SELECT `, my query should fail. If the ORM _actually_ does `SELECT ` I'll get back two columns instead of three and things can get spooky and bad real fast (unless the ORM manually validates the shape of the query response every time, which will come with a real runtime cost). If there are columns that the ORM doesn't know about, you could end up with _far more_ data being returned from the database, which could just as easily cause plenty of spooky issues—not the least of which being "overwhelming your network by flooding the client connections with data the application doesn't even know exists".

show 1 reply
compton93yesterday at 11:56 PM

I worked for startup who did all of these things on CockroachDB. We could of used a single m5.xlarge PostgreSQL instance (1000 basic QPS on 150GB of data) if we optimized our queries and went back to basics, instead we had 1TB of RAM dedicated to Cockroach.

I added about 4 indexes and halved the resources overnight. But Prisma, SELECT *, graphql and what other resume building shit people implemented was the bane of my existence, typically engineers did this believing it would be faster. I remember 1 engineer had a standing ovation in slack for his refactor which was supposedly going to save us $$$$$ except our DB CPU went up 30% because he decided to validate every company every second in every session. In his defense, he added 1 line of code that caused it, and it was obscured through prisma and graphql to an inefficient query.

FWIW; I love CockroachDB but the price is directly linked to how much your software engineers shit on the database.

reissbakertoday at 12:13 AM

Eh, I've run applications on RDBMSes with multi-billion-row tables, and I've never found normalization/denormalization to be particularly impactful on performance except for in a few rare cases. The biggest impact came from sensible indexing + query patterns. Normalization vs denormalization had a big impact on convenience, though (not always favoring one way or the other!).

But I'm no fan of Prisma either. Drizzle has its own pain points (i.e. sequential numbers for its auto-generated migrations means annoying merge conflicts if multiple people iterate on the schema at the same time), but it's much better than Prisma at sticking close to the metal and allowing good query performance and table design.

show 1 reply
amazingamazingyesterday at 11:47 PM

I don't disagree with your point, but over normalization and joining everywhere also isn't necessarily the answer, even with an index. there's no easy answer to this, really depends on the performance characteristics the critical user journeys need.

with a little pain, if I had to pick an extreme, I'd pick extreme normalization with materialized views that are queried (e.g. no joins), rather than joining all of the time.

show 2 replies
HappyJoyyesterday at 11:53 PM

Spelling out columns can help the query optimizer too

thr0wyesterday at 11:56 PM

> Disk is cheap, but a. Memory isn’t

This isn't said enough.