I’ve done a lot of interviewing and I’ve discovered that many devs (even experienced ones) don’t understand the difference between indexes and foreign keys.
My assumption is that people have used orms that automatically add the index for you when you create a relationship so they just conflate them all. Often they’ll say that a foreign key is needed to improve the performance and when you dig into it, their mental model is all wrong. The sense they have is that the other table gets some sort of relationship array structure to make lookups fast.
It’s an interesting phenomenon of the abstraction.
Don’t get me wrong, I love sqlalchemy and alembic but probably because I understand what’s happening underneath so I know the right way to hold it so things are efficient and migrations are safe.
Huh, that's interesting. Mixing indexes and FKs is a major conceptual error.
FWIW, I've also asked everyone I've interviewed in the past decade about indexes and FKs. Most folks I've talked to seem to understand FKs. They're often fuzzier on the details of indexes, but I don't recall anyone conflating the two.
> their mental model is all wrong.
Is it? In Postgres, all FK references must be to a column with a PK or unique constraint or part of another index. Additionally, Postgres and Maria (maybe all SQL?) automatically create indexes for PKs and unique constraints. There's a high likelihood that a foreign key is already indexed _in the other table_.
Generally, I agree with your statement. Adding a FK won't magically improve performance or create useful indices. But, the presence of a FK or refactoring to support a FK does (tangentially) point back to that index.
Very strange if you ask me and disturbing. I don't know if I'd let such a dev touch a database. Of course nowadays we just vibe code and YOLO everything, but still. This is making me feel old.
An index is one thing (and important and good), but an FK allows you to completely eschew IO if done right. In other words "I guarantee that all values in this list exist in that list" is a great simple optimization path and some sql engines can use it to avoid joining data or checking for existence at all.
During a work meeting I once suggested using a non-PK column in a Postgres database for a foreign key. A coworker confidently said that we shouldn't because joins would be slow. I pointed out that we could create an index on that column and they rebutted by claiming that PKs created some kind of "special" index. I didn't want to burn goodwill and so didn't push it further but it always struck me as silly.
Depending upon the database storage engine, available memory, and table size I could see there being _some_ performance hit if only PKs are used for statistics but I'd think that modern RDBMSes are smart enough to cache appropriately. Am I missing something?