logoalt Hacker News

LeCompteSftwaretoday at 8:50 AM1 replyview on HN

There's a contradiction here that needs to be untangled:

  There are many examples of models that enable coherent systems within specific domains:

  - Type systems in programming languages catch many logic errors and interface misuses

  - The relational model in databases enables programmers to access incredible scale and performance with minimal effort.

  [...]

  So coherent systems are great: everyone should just buy into whatever model will most effectively do the job. Right? Unfortunately, the listed models are all domain-specific–they don’t generalize to other contexts. And most modern internet software is not domain-specific. Modern applications typically span a wide variety of domains, including web and API serving, transaction processing, background processing, analytical processing, and telemetry. That means that trying to keep a system coherent limits what that system can ultimately do. As one implements more capabilities, application requirements push us outside of a single domain, forcing us to reach for components with a different internal model. So, bit by bit, our system fragments.
The problem of course is that type systems and databases are not meaningfully "domain-specific." They aren't technical magic bullets but they separately provide real value for the use cases of "web and API serving, transaction processing, background processing, analytical processing, and telemetry." So then why hasn't the industry settled on a specific type system? Why do database vendors (and the SQL standard) keep breaking the relational model in favor of something ad hoc and irritating?

I believe the real problem is that software is symbolic and the problems it solves usually aren't. Writing an application means committing to a certain set of symbolic axioms and derivation schemas, and these are never going to encapsulate the complexity of the real world. This relates to Greenspun's 10th rule:

  Any sufficiently complicated C or Fortran program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp.
Or in a modern context, C++/C# and managing a huge amount of configuration data with a janky JSON/XML parser, often gussied up as an "entity component system" in game development, or a "DSL" in enterprise. The entirely equivalent alternative is a huge amount of (deterministic!) compile-time code generation. Any specific symbolic system small enough to be useful to humans is eventually going to go "out of sync" with the real world. The authors hint at this with the discrepancy between SQL's type system and that of most programming languages, but this is a historical artifact. The real problem is that language designers make different tradeoffs when designing their type system, and I believe this tradeoff is essentially fundamental. Lisp is a dynamically-typed s-expression parser and Lisp programs benefit from being able to quickly and easily deal with an arbitrary tree of whatever objects. In C#/C++ you would either have to do some painful generics boilerplate (likely codegen with C#) or box everything as System.Object / void pointer and actually lose some of the type safety that Lisp provides. OTOH Idris and Lean can do heterogeneous lists and trees a little more easily, but that cost is badly paid for in compilation times, and AFAICT it'll still demand irritating "mother may I?" boilerplate to please the typechecker. There is a fundamental tradeoff that seems innate to the idea of communicating with relatively short strings of relatively few symbols.

This sounds like Godel incompleteness, and it's a related idea. But this has more to do with cognition and linguistics. I wish I was able to write a little more coherently about this... I guess I should collect some references and put together a blog at some point.


Replies

Toutouxctoday at 10:10 AM

> The problem of course is that type systems and databases are not meaningfully "domain-specific." They aren't technical magic bullets but they separately provide real value for the use cases of "web and API serving, transaction processing, background processing, analytical processing, and telemetry." So then why hasn't the industry settled on a specific type system? Why do database vendors (and the SQL standard) keep breaking the relational model in favor of something ad hoc and irritating?

I'm not sure what point you're trying to make here. The list you're referring to is definitely a bit hand-wavy, but it also makes sense to me to read it as, for example, "today's relational databases (software) are almost perfectly aligned to the domain of relational databases (concept)". As in, MariaDB running on my Mac wraps an insane amount of complexity and smarts in a very coherent system that only exposes a handful of general concepts.

The concepts don't match what I'd like to work with in my Rails app, which makes the combination of both a "fragmented system", as the article calls it, but the database itself, the columns, tables, rows and SQL above it all, that's coherent and very powerful.

show 1 reply