logoalt Hacker News

m132yesterday at 6:31 PM10 repliesview on HN

Projects like this and Docker make me seriously wonder where software engineering is going. Don't get me wrong, I don't mean to criticize Docker or Toro in parcicular. It's the increasing dependency on such approaches that bothers me.

Docker was conceived to solve the problem of things "working on my machine", and not anywhere else. This was generally caused by the differences in the configuration and versions of dependencies. Its approach was simple: bundle both of these together with the application in unified images, and deploy these images as atomic units.

Somewhere along the lines however, the problem has mutated into "works on my container host". How is that possible? Turns out that with larger modular applications, the configuration and dependencies naturally demand separation. This results in them moving up a layer, in this case creating a network of inter-dependent containers that you now have to put together for the whole thing to start... and we're back to square one, with way more bloat in between.

Now hardware virtualization. I like how AArch64 generalizes this: there are 4 levels of privilege baked into the architecture. Each has control over the lower and can call up the one immediately above to request a service. Simple. Let's narrow our focus to the lowest three: EL0 (classically the user space), EL1 (the kernel), EL2 (the hypervisor). EL0, in most operating systems, isn't capable of doing much on its own; its sole purpose is to do raw computation and request I/O from EL1. EL1, on the other hand, has the powers to directly talk to the hardware.

Everyone is happy, until the complexity of EL1 grows out of control and becomes a huge attack surface, difficult to secure and easy to exploit from EL0. Not good. The naive solution? Go a level above, and create a layer that will constrain EL1, or actually, run multiple, per-application EL1s, and punch some holes through for them to still be able to do the job—create a hypervisor. But then, as those vaguely defined "holes", also called system calls and hyper calls, grow, won't so the attack surface?

Or in other words, with the user space shifting to EL1, will our hypervisor become the operating system, just like docker-compose became a dynamic linker?


Replies

nine_kyesterday at 10:55 PM

I see a number of assumptions in your post which I find not matching my view of the picture.

Containers arose as a way to solve the dependency problems created by traditional Unix. They grow from tools like chroot, BSD jails, and Solaris Zones. Containers allow to deploy dependencies that cannot be simultaneously installed on a traditional Unix host system. it's not a UNIX architecture limitation but rather a result of POSIX + tradition; e.g. Nix also solves this, but differently.

Containers (like chroot and jail before them) also help ensure that a running service does not depend on the parts of the filesystem it wasn't given access to. Additionally, containers can limit network access, and process tree access.

These limitations are not a proper security boundary, but definitely a dependency boundary, helping avoid spaghetti-style dependencies, and surprises like "we never realized that our ${X} depends on ${Y}".

Then, there's the Fundamental Theorem of Software Engineering [1], which states: "We can solve any problem by introducing an extra level of indirection." So yes, expect the number of levels of indirection to grow everywhere in the stack. A wise engineer can expect to merge or remove a some levels here and there, when the need for them is gone, but they would never expect that new levels of indirection should stop emerging.

[1]: https://en.wikipedia.org/wiki/Fundamental_theorem_of_softwar...

show 1 reply
eybergyesterday at 6:54 PM

Containers got popular at at time when there were an increasingly number of people that were finding it hard to install software on their system locally - especially if you were, for instance, having to juggle multiple versions of ruby or multiple versions of python and those linked to various major versions of c libraries.

Unfortunately containers have always had an absolutely horrendous security story and they degrade performance by quite a lot.

The hypervisor is not going away anytime soon - it is what the entire public cloud is built on.

While you are correct that containers do add more layers - unikernels go the opposite direction and actively remove those layers. Also, imo the "attack surface" is by far the smallest security benefit - other architectural concepts such as the complete lack of an interactive userland is far more beneficial when you consider what an attacker actually wants to do after landing on your box. (eg: run their software)

When you deploy to AWS you have two layers of linux - one that AWS runs and one that you run - but you don't really need that second layer and you can have much faster/safer software without it.

show 6 replies
zozbot234yesterday at 9:23 PM

> This results in them moving up a layer, in this case creating a network of inter-dependent containers that you now have to put together for the whole thing to start... and we're back to square one, with way more bloat in between.

The difference is that you can move that whole bunch of interlinked containers to another machine and it will work. You don't get that when running on bare hardware. The technology of "containers" is ultimately about having the kernel expose a cleaned up "namespaced" interface to userspace running inside the container, that abstracts away the details of the original machine. This is very much not intended as "sandboxing" in a security sense, but for most other system administration purposes it gets pretty darn close.

BobbyTables2yesterday at 8:39 PM

I’ve had similar concerns.

At some point, few people even understand the whole system and whether all these layers are actually accomplishing anything.

It’s especially bad when the code running at rarified levels is developed by junior engineers and “sold” as an opaque closed source thing. It starts to actually weaken security in some ways but nobody is willing to talk about that.

“It has electrolytes…”

show 1 reply
j-kriegeryesterday at 9:11 PM

> This results in them moving up a layer, in this case creating a network of inter-dependent containers that you now have to put together for the whole thing to start... and we're back to square one, with way more bloat in between.

Yea, with uneeded bload like rule based access controls, ACS and secret management. Some comments on this site.

kitdyesterday at 6:51 PM

This results in them moving up a layer, in this case creating a network of inter-dependent containers that you now have to put together for the whole thing to start... and we're back to square one, with way more bloat in between.

I think you're over-egging the pudding. In reality, you're unlikely to use more than 2 types of container host (local dev and normal deployment maybe), so I think we've moved way beyond square 1. Config is normally very similar, just expressed differently, and being able to encapsulate dependencies removes a ton of headaches.

drawnwrenyesterday at 8:57 PM

Nix is where we're going. Maybe not with the configuration language that annoys python devs, but declarative reproducible system closures are a joy to work with at scale.

show 1 reply
soulofmischiefyesterday at 8:01 PM

I've been running either Qubes OS or KVM/QEMU based VMs as my desktop daily driver for 10 years. Nothing runs on bare metal except for the host kernel/hypervisor and virt stack.

I've achieved near-native performance for intensive activities like gaming, music and visual production. Hardware acceleration is kind of a mess but using tricks like GPU passthrough for multiple cards, dedicated audio cards and and block device passthrough, I can achieve great latency and performance.

One benefit of this is that my desktop acts as a mainframe, and streaming machines to thin clients is easy.

My model for a long time has been not to trust anything I run, and this allows me to keep both my own and my client's work reasonably safe from a drive-by NPM install or something of that caliber.

Now that I also use a Apple Silicon MacBook as a daily driver, I very much miss the comfort of a fully virtualized system. I do stream in virtual machines from my mainframe. But the way Tahoe is shaping up, I might soon put Asahi on this machine and go back to a fully virtualized system.

I think this is the ideal way to do things, however, it will need to operate mostly transparently to an end user or they will quickly get security fatigue; the sacrifices involved today are not for those who lack patience.

Also, relevant XKCDs:

https://www.explainxkcd.com/wiki/index.php/2044:_Sandboxing_...

https://www.explainxkcd.com/wiki/index.php/2166:_Stack

show 2 replies
fragmedeyesterday at 11:27 PM

If you're going to bring up ARM and EL levels, but not mention rings/CPL on x86, the discussion seems incomplete.

gucci-on-fleektoday at 4:49 AM

> Now hardware virtualization. I like how AArch64 generalizes this: there are 4 levels of privilege baked into the architecture. Each has control over the lower and can call up the one immediately above to request a service. Simple. Let's narrow our focus to the lowest three: EL0 (classically the user space), EL1 (the kernel), EL2 (the hypervisor). EL0, in most operating systems, isn't capable of doing much on its own; its sole purpose is to do raw computation and request I/O from EL1. EL1, on the other hand, has the powers to directly talk to the hardware.

> Everyone is happy, until the complexity of EL1 grows out of control and becomes a huge attack surface, difficult to secure and easy to exploit from EL0. Not good. The naive solution? Go a level above, and create a layer that will constrain EL1, or actually, run multiple, per-application EL1s, and punch some holes through for them to still be able to do the job—create a hypervisor. But then, as those vaguely defined "holes", also called system calls and hyper calls, grow, won't so the attack surface?

(Disclaimer: this is all very far from my area of expertise, so some of the below may be wrong/misleading)

Nobody can agree whether microkernels or monolithic kernel are "better" in general, but most people seem to agree that microkernels are better for security [0], with seL4 [1] being a fairly strong example. But microkernels are quite a bit slower, so in the past when computers were slower, microkernels were noticeably slower, and security was less of a concern than it is now, so essentially every mainstream operating system in the 90s used some sort of monolithic kernel. These days, people might prefer different security–performance tradeoffs, but we're still using kernels designed in the 90s, so it isn't easy to change this any more.

Moving things to the hypervisor level lets us gain most of the security benefits of microkernels while maintaining near-perfect compatibility with the classic Linux/NT kernels. And the combination of faster computers (performance overheads therefore being less of an issue), more academic research, and high-quality practical implementations [2] means that I don't expect the current microkernel-style hypervisors to gain much new attack surface.

This idea isn't without precedent either—Multics (from the early 70s) was partially designed around security, and used a similar design with hardware-enforced hierarchical security levels [3]. Classic x86 also supports 4 different "protection rings" [4], and virtualisation plus nested virtualisation adds 2 more, but nothing ever used rings 1 and 2, so adding virtualisation just brings us back to the same number of effective rings as the original design.

[0]: https://en.wikipedia.org/wiki/Microkernel#Security

[1]: https://sel4.systems/

[2]: https://xenproject.org/projects/hypervisor/

[3]: https://en.wikipedia.org/wiki/Multics#Project_history

[4]: https://en.wikipedia.org/wiki/Protection_ring

show 1 reply