logoalt Hacker News

apilast Sunday at 8:20 PM5 repliesview on HN

An open secret in our field is: the current market leading OSes and (to some extent) system architectures are antiquated and sub-optimal at their foundation due to backward compatibility requirements.

If we started green field today and managed to mitigate second system syndrome, we could design something faster, safer, overall simpler, and easier to program.

Every decent engineer and CS person knows this. But it’s unlikely for two reasons.

One is that doing it while avoiding second system syndrome takes teams with a huge amount of both expertise and discipline. That includes the discipline to be ruthless about exterminating complexity and saying no. That’s institutionally hard.

The second is that there isn’t strong demand. What we have is good enough for what most of the market wants, and right now all the demand for new architecture work is in the GPU/NPU/TPU space for AI. Nobody is interested in messing with the foundation when all the action is there. The CPU in that world is just a job manager for the AI tensor math machine.

Quantum computing will be similar. QC will be controlled by conventional machines, making the latter boring.

We may be past the window where rethinking architectural choices is possible. If you told me we still had Unix in 2000 years I would consider it plausible.


Replies

nine_klast Sunday at 8:27 PM

Aerospace, automotive, and medical devices represent a strong demand. They sometimes use and run really interesting stuff, due to the lack of such a strong backwards-compatibility demand, and a very high cost of software malfunction. Your onboard engine control system can run an OS based on seL4 with software written using Ada SPARK, or something. Nobody would bat an eye, nobody needs to run 20-years-old third-party software on it.

show 1 reply
matu3balast Monday at 10:11 AM

> we could design something faster, safer, overall simpler, and easier to program

I do remain doubtful on this for general purpose computing principles: Hardware for low latency/high throughput is at odds with full security (absence of observable side-channels). Optimal latency/throughput requires time-constrained=hardware programming with FGPAs or building hardware (high cost) usually programmed on dedicated hardware/software or via things like system-bypass solutions. Simplicity is at odds with generality, see weak/strong formal system vs strong/weak semantics.

If you factor those compromises in, then you'll end up with the current state plus historical mistakes like missing vertical system integration of software stacks above Kernel-space as TCB, bad APIs due to missing formalization, CHERI with its current shortcomings, etc.

I do expect things to change once security with mandatory security processor becomes more required leading to multi-CPU solutions and potential for developers to use on the system complex+simple CPUs, meaning roughly time-accurate virtual and/or real ones.

> The second is that there isn’t strong demand.

This is not true for virtualization and security use cases, but not that obvious yet due to missing wide-spread attacks, see side-channel leaks of cloud solutions. Take a look at hardware security module vendors growth.

themafialast Monday at 12:19 AM

> That includes the discipline to be ruthless about exterminating complexity and saying no. That’s institutionally hard.

You need to make a product that out-performs your competitors. If their chip is faster then your work will be ignored regardless of how pure you managed to keep it.

> We may be past the window where rethinking architectural choices is possible.

I think your presumption that our architectures are extremely sub-optimal is wrong. They're exceptionally optimized. Just spend some time thinking about branch prediction and register renaming. It's a steep cliff for any new entrant. You not only have to produce something novel and worthwhile but you have to incorporate decades of deep knowledge into the core of your product, and you have to do all of that without introducing any hardware bugs.

You stand on the shoulders of giants and complain about the style of their footwear.

show 1 reply
foobarianlast Sunday at 8:50 PM

> something faster

How true is this, really? When does the OS kernel take up more than a percent or so of a machine's resources nowadays? I think the problem is that there is so little juice there to squeeze that it's not worth the huge effort.

show 2 replies
int_19hlast Monday at 12:21 AM

The thing about AI though is that it has indirect effects down the line. E.g. as prevalence of AI-generated code increases, I would argue that we'll need more guardrails both in development (to ground the model) and at runtime (to ensure that when it still fails, the outcome is not catastrophic).