> But SQL schemas often look like this. Columns are nullable by default, and wide tables are common.
Hard disagree. That database table was a waving red flag. I don't know enough/any rust so don't really understand the rest of the article but I have never in my life worked with a database table that had 700 columns. Or even 100.
OLTP tables typically are normalized.
But OLAP tables (data lake/warehouse stuff), for speed purposes, are intentionally denormalized and yes, you can have 100+ columns of nullable stuff.
I saw tables with more than a thousand columns. It was a law firm home-grown FileMaker tool. Didn't inspect it too closely, so don't know what was inside
I remember a phrase from one of C. J. Date's books: every record is a logical statement. It really stood out for me and I keep returning to it. Such an understanding implies a rather small number of fields or the logical complexity will go through the roof.
If lots of columns are a red flag then red flags are quite common in many businesses. I’ve seen tables with tens of thousands of columns. Naturally those are not used by humans writing sql by hand, but there are many tools that have crazy data layouts and generate crazy sql to work with it.
Hi, I'm the author of the article.
As to your hard disagree, I guess it depends... While this particular user is on the higher end (in terms of columns), it's not our only user where column counts are huge. We see tables with 100+ columns on a fairly regular basis especially when dealing with larger enterprises.
It might not be common in typical software shops. I work in manufacturing and our database has multiple tables with hundreds of columns.
No idea what these guys do exactly but their tagline says "Feldera's award-winning incremental compute engine runs SQL pipelines of any complexity"
So it sounds like helping customers with databases full of red flags is their bread and butter
https://apps.naaccr.org/data-dictionary/data-dictionary/vers...
771 columns (and I've read the definitions for them all, plus about 50 more that have been retired). In the database, these are split across at least 3 tables (registry, patient, tumor). But when working with the records, it's common to use one joined table. Luckily, even that usually fits in RAM.
Not everyone understands normal form, much less 3rd normal form. I’ve seen people do worse with excel files where they ran out of columns and had to link across spreadsheets.
It's OLAP, it very common for analytical tables to be denormalized. As an example, each UserAction row can include every field from Device and User to maximize the speed at which fraud detection works. You might even want to store multiple Devices in a single row: current, common 1, 2 and 3.
It is very common to find tables with 1000+ columns in machine learning training sets at e-commerce companies. The largest I have seen had over 10000 columns.
Salesforce by default comes with some where your tables have 50 columns before you start tweaking anything.
100s is not unusual. Thousands happens before you realise.
That statement jumped out at me as well. I've worked as a DBA on tons of databases backing a wide variety of ERPs, web apps, analytics, data warehouses...700 columns?!? No.
I have seen tables (SQL and parquet, too) that have at least high hundreds of optional columns, but this was always understood to be a terrible hack, in those cases.
> Hard disagree. That database table was a waving red flag.
Exactly this.
This article is not about structs or Rust. This article is about poor design of the whole persistence layer. I mean, hundreds of columns? Almost all of them optional? This is the kind of design that gets candidates to junior engineer positions kicked off a hiring round.
Nobody gets fired for using a struct? If it's an organization that tolerates database tables with nearly 1k optional rows then that comes at no surprise.
[dead]
Some businesses are genuinely this complicated. Splitting those facts into additional tables isn't going to help very much unless it actually mirrors the shape of the business. If it doesn't align, you are forcing a lot of downstream joins for no good reason.