Linear Address Spaces: Unsafe at any speed (2022)

174 points • by nithssh • last Wednesday at 7:57 AM • 143 comments • view on HN

Comments

> Why do we even have linear physical and virtual addresses in the first place, when pretty much everything today is object-oriented?

Because the attempts at segmented or object-oriented address spaces failed miserably.

> Linear virtual addresses were made to be backwards-compatible with tiny computers with linear physical addresses but without virtual memory.

That is false. In the Intel World, we first had the iAPX 432, which was an object-capability design. To say it failed miserably is overselling its success by a good margin.

The 8086 was sort-of segmented to get 20 bit addresses out of a 16 bit machine and a stop-gap and a huge success. The 80286 did things "properly" again and went all-in on the segments when going to virtual memory...and sucked. Best I remember, it was used almost exclusively as a faster 8086, with the 80286 modes used to page memory in and out and with the "reset and recover" hack to then get back to real mode for real work.

The 80386 introduced the flat address space and paged virtual memory not because of backwards-compatibility, but because it could and it was clearly The Right Thing™.

➕ show 8 replies

tliltocatl • last Sunday at 7:32 PM

> Show me somebody who calls the IBM S/360 a RISC design, and I will show you somebody who works with the s390 instruction set today.

Ahaha so true.

But to answer the post's main question:

> Why do we even have linear physical and virtual addresses in the first place, when pretty much everything today is object-oriented?

Because backwards compatibility is more valuable than elegant designs. Because array-crunching performance is more important than safety. Because a fix for a V8 vulnerability can be quickly deployed while a hardware vulnerability fix cannot. Because you can express any object model on top of flat memory, but expressing one object model (or flat memory) in terms of another object model usually costs a lot. Because nobody ever agreed of what the object model should be. But most importantly: because "memory safety" is not worth the costs.

➕ show 2 replies

sph • yesterday at 8:08 AM

> Why do we even have linear physical and virtual addresses in the first place, when pretty much everything today is object-oriented?

What a weird question, conflating one thing with the other.

I’m working on a object capability system, and trying hard to see if I can make it work using a linear address space so I don’t have to waste two or three pages per “process” [1][2] I really don’t see how objects have anything to do with virtual memory and memory isolation, as they are a higher abstraction. These objects have to live somewhere, unless the author is proposing a system without the classical model of addressable RAM.

—-

1: the reason I prefer a linear address space is that I want to run millions of actors/capabilities on a machine, and the latency and memory usage of switching address space and registers become really onerous. Also, I am really curious to see how ridiculously fast modern CPUs are when you’re not thrashing the TLB every millisecond or so.

2: in my case I let system processes/capabilities written in C run in linear address space where security isn’t a concern, and user space in a RISC-V VM so they can’t escape. The dream is that CHERI actually goes into production and user space can run on hardware, but that’s a big if.

The memory management story is still a big question: how do you do allocations in a linear address space? If you give out pages, there’s a lot of wastage. The alternative is a global memory allocator, which I am really not keen on. Still figuring out as I go.

➕ show 2 replies

IsTom • last Sunday at 8:49 PM

> the data bus is 128 bits wide: 64-bit for the data and 64-bit for data's type

That seems a bit wasteful if you're not using a lot of object types.

➕ show 2 replies

monster_truck • last Sunday at 9:10 PM

If the author is reading these comments: Please write about the fully semantic IDE as soon as you can. Very interested in hearing more about that as it sounds like you've used it a lot

mikewarot • last Sunday at 7:58 PM

So how do you hook up such a system to actual RAM or EPROMs to allow it to function? Somewhere there has to be an actual address generated.

➕ show 1 reply

phkamp • yesterday at 11:19 AM

Author here.

This is one of those things, where 99.999% of all IT people have never even heard or imagined that things can be different than "how we have always done it." (Obligatory Douglas Adams quote goes here.)

This makes a certain kind of people, self-secure in their own knowledge, burst out words like "clueless", "fail miserably" etc. based on insufficient depth of actual knowledge. To them I can only say: Study harder, this is so much more technologically interesting, than you can imagine.

And yes, neither the iAPX432, nor for that matter Z8000, fared well with their segmented memory models, but it is important to remember that they primarily failed for entirely different reasons, mostly out of touch top-management, so we cannot, and should not, conclude from that, that all such memory models cannot possibly work.

There are several interesting memory models, which never really got a fair chance, because they came too early to benefit from VLSI technology, and it would be stupid to ignore a good idea, just because it was untimely. (Obligatory "Mother of all demos" reference goes here.)

CHERI is one such memory model, and probably the one we will end up with, at least in critical applications: Stick with the linear physical memory, but cabin the pointers.

In many applications, that can allow you to disable all the Virtual Memory hardware entirely. (I think the "CHERIot" project does this ?)

The R1000 model is different, but as far as I can tell equally valid, but it suffers from a much harder "getting from A to B" problem than CHERI does, yet I can see several kinds of applications where it would totally scream around any other memory model.

But if people have never even heard about it, or think that just because computers look a certain way today, every other idea we tried must be definition have been worse, nobody will ever do the back-of-the-napkin math, to see if would make sense to try it out (again).

I'm sure there are also other memory concepts, even I have not heard about. (Yes, I've worked with IBM S/38)

But what we have right now, huge flat memory spaces, physical and virtual, with a horribly expensive translation mechanism between them, and no pointer safety, is literally the worst of all imaginable memory models, for the kind of computing we do, and the kind of security challenges we face.

There are other similar "we have always done it that way" mental blocks we need to reexamine, and I will answer one tiny question below, by giving an example:

Imagine you sit somewhere in a corner of a HUGE project, like a major commercial operating system with al the bells and whistles, the integrated air-traffic control system for a continent or the software for a state-of-the-art military gadget.

You maintain this library, which exports this function, which has a parameter which defaults to three.

For sound and sane reasons, you need to change the default to four now.

The compiler wont notice.

The linker wont notice.

People will need to know.

Who do you call ?

In the "Rational Environment" on the R1000 computer, you change 3 to 4 and, when you attempt to save your change, the semantic IDE refuses, informing you that it would change the semantics of the following three modules, which call your function without specifying that parameter explicitly - even if you do not have read permission to the source code of those modules.

The Rational Environment did that 40 years ago, can your IDE do that for you today ?

Some developers get a bit upset about that when we demo that in Datamuseum.dk :-)

The difference is that all modern IDEs regard each individual source file as "ground truth", but has nothing even remotely like an overview, or conceptual understanding, of the entire software project.

Yeah, sure, it knows what include files/declaration/exports things depend on, and which source files to link into which modules/packages/libraries, but it does not know what any of it actually means.

And sure, grep(1) is wonderful, but it only tells you what source code you need to read - provided you have the permission to do so.

In the Rational Environment ground truth is the parse tree, and what can best be described as a "preliminary symbol resolution", which is why it knows exactly which lines of code, in the entire project, call your function, with or without what parameters.

Not all ideas are good.

Not all good ideas are lucky.

Not all forgotten ideas should be ignored.

➕ show 2 replies

themafia • yesterday at 12:11 AM

> Why do we even have linear physical and virtual addresses in the first place, when pretty much everything today is object-oriented?

Maybe it's because even though x86-64 is a 64-bit instruction set, all the CALL and JMP instructions still only support relative 8-bit or 32-bit offsets.

> Translating from linear virtual addresses to linear physical addresses is slow and complicated, because 64-bit can address a lot of memory.

Sure but spend some time thinking about how GOT and PLT aren't great solutions and can easily introduce their own set of security complications due to the above limitations.

ch_123 • last Sunday at 8:45 PM

The Rational R1000 is an interesting (and obscure) example to use - IBM's S/38 and AS/400 (now IBM i) also took a similar approach, and saw far more widespread usage.

Veserv • last Sunday at 8:50 PM

What a clueless post. Even ignoring their massive overstatement of the difficulty and hardware complexity of hardware mapping tables, they appear to not even understand the problems solved by mapping tables.

Okay, let us say you have a physical object store. How are the actual contents of those objects stored? Are they stored in individual, isolated memory blocks? What if I want to make a 4 GB array? Do I need to have 4 GB memory blocks? What if I only have 6 GB? That is obviously unworkable.

Okay, we can solve that by compacting our physical object store onto a physical linear store and just presenting a object store as a abstraction. Sure, we have a physical linear store, but we never present that to the user. But what if somebody deallocates a object? Obviously we should be able to reuse that underlying physical linear store. What if they allocated a 4 GB array? Obviously we need to be able to fragment that into smaller pieces for future objects. What if we deallocated 4 GB of disjoint 4 KB objects? Should we fail to allocate a 8 KB object just because the fragments are not contiguous? Oh, just keep in mind the precise structure of the underlying physical store to avoid that (what a leaky and error-prone abstraction). Oh, but what about if there are multiple programs running, some potentially even buggy, how the hell am I supposed to keep track of the shared physical store to keep track of global fragmentation of the shared resource?

Okay, we can solve all of that with a level of indirection by giving you a physical object key instead of a physical object "reference". You present the key, and then we have a runtime structure that allows us to lookup where in the physical linear store we have put that data. This allows us to move and compact the underlying storage while letting you have a stable key. Now we have a mapping between object key and linear physical memory. But what if there are multiple programs on the same machine, some of which may be untrustworthy? What if they just start using keys they were not given? Obviously we need some scheme of preventing anybody from using any key. Maybe we could solve that by tagging every object in the system with a list of every program allowed to use it? But the number of programs is dynamic and if we have millions or billions of objects, each new program would require re-tagging all of those objects. We could make that list only encode "allowed" programs which would save space and amount of cleanup work, but how would the hardware do that lookup efficiently and how would it store that data efficiently?

Okay, we can solve that by having a per-program mapping between object key to linear physical memory. Oh no, that is looking suspiciously close to the per-program mapping between linear virtual memory to linear physical memory. Hopefully there are no other problems that will just result in us getting back to right where we started. Oh no, here comes another one. How is your machine storing this mapping between object key to linear physical memory? If you will remember from your data structures courses, those would usually be implemented as either a hash table or a tree. A tree sounds too suspiciously close to what currently exists, so let us use a hash table.

Okay, cool, how big should the hash table be? What if I want a billion objects in this program and a thousand objects in a different program? I guess we should use a growable hash table. All that happens is that if we allocate enough objects we allocate a new, dynamically sized storage structure then bulk rehash and insert all the old objects. That is amortized O(1), just at the cost of a unpredictable pause on potentially any memory allocation which can not only be gigantic, but is proportional to the number of live allocations. That is fine if our goal is just putting in a whole hardware garbage collector, but not really applicable for high performance computing. For high performance computing we would want worse case bounded time and memory cost (not amortized, per-operation).

Okay, I guess we have to go with a per-program tree-based mapping between object key to linear physical memory. But it is still a object store, so we won, right? How is the hardware going to walk that efficiently? For the hardware to walk that efficiently, you are going to want a highly regular structure with high fanout to both maximize the value of the cache lines you will load and to reduce the worst case number of cache lines you need to load. So you will want a B-Tree structure of some form. Oh no, that is exactly what hardware mapping tables look like.

But it is still a object store, so we won, right? But what if I deallocated 4 GB of disjoint 4 KB objects? You could move and recompact all of that memory, but why? You already have a mapping structure with a layer of indirection via object keys. Just create a interior mapping within a object between the object-relative offsets and potentially disjoint linear physical memory. Then you do not need physically contiguous backing, you can use disjoint physical linear store to provide the abstraction of a object linear store.

And now we have a per-program tree-based mapping between linear object address to linear physical memory. But what if the objects are of various sizes? In some cases the hardware will traverse the mapping from object key to linear object store, then potentially need to traverse another mapping from a large linear object address to linear physical memory. If we just compact the linear object store mappings, then we can unify the trees and just provide a common linear address to linear physical memory mapping and the tree-based mapping will be tightly bounded for all walks.

And there we have it, a per-program tree-based mapping between linear virtual memory and linear physical memory one step at a time.

➕ show 4 replies

dmytroi • last Sunday at 8:55 PM

armv8/VMSAv8-64 has huge table support with optional contiguous bit allowing mapping up to 16GB at a time [0] [1]. Which will result in (almost) no address translations on any practical amount of memory available today.

Likely the issue is between most user systems not configuring huge tables and developers not keen on using things they can't test locally. Though huge tables are prominent in single-app servers and game consoles spaces.

- [0] https://docs.kernel.org/arch/arm64/hugetlbpage.html - [1] https://developer.arm.com/documentation/ddi0487/latest (section D8.7.1 at the time of writing)

➕ show 1 reply

indolering • last Sunday at 8:21 PM

CHERI is undeniably on the rise. Adapting existing code generally only requires rewriting less than 1% of the codebase. It offers speedups for existing as well as new languages (designed with the hardware in mind). I expect to see it everywhere in about a decade.

➕ show 3 replies

btdmaster • last Sunday at 7:47 PM

I think you could argue there is already some effort to do type safety at the ISA register level, with e.g. shadow stack or control flow integrity. Isn't that very similar to this, except targeting program state rather than external memory?

➕ show 2 replies

codedokode • yesterday at 12:06 AM

What about an architecture, where there are pages and access permissions, but no translation (virtual address is always equal to physical)? fork() would become impossible, but Windows is fine without it anyway.

➕ show 1 reply

irdc • last Sunday at 7:49 PM

> Why do we even have linear physical and virtual addresses in the first place, when pretty much everything today is object-oriented?

But what happens when the in-memory size of objects approaches 2⁶⁴? How to even map such a thing without multi-level page tables?

➕ show 3 replies

dmitrygr • yesterday at 2:41 AM

“Why don’t we do $thing_that_decisively_failed instead of $thing_that_evolved_to_beat_all_other_approaches?” Usually this sort of question comes from a lack of understanding of the history of the failure of the first and the success of the second.

The fence principle always applies “don’t tear down a fence till you understand why it was built”

Linear address spaces allow for how computers actually operate - layers. Objects are hard to deal with by layers who don’t know about them. Bytes aren’t. They are just bytes. How do you page out “an object”? Do I now need to solve the knapsack problem to efficiently tile them on disk based on their most recent use time and size? …1000 other things…

➕ show 2 replies

wyager • yesterday at 1:14 AM

> Like mandatory seat belts, some people argue that there would be no need for CHERI if everyone "just used type-safe languages"[...] I'm not having any of it.

It wish the author would have offered a more detailed refutation than "I'm not having it". I'm pretty sure the claim is right! I'm fairly convinced that we'd be a lot better off moving to ring0-only linear-memory architectures and rely on abstraction-theoretic security ("langsec") rather than fattening up the hardware with random whack-a-mole mitigations. We're gradually moving in that direction anyway without much of a concerted effort.

anthk • last Sunday at 8:22 PM

DId Multics solve this in any way?

api • last Sunday at 8:20 PM

An open secret in our field is: the current market leading OSes and (to some extent) system architectures are antiquated and sub-optimal at their foundation due to backward compatibility requirements.

If we started green field today and managed to mitigate second system syndrome, we could design something faster, safer, overall simpler, and easier to program.

Every decent engineer and CS person knows this. But it’s unlikely for two reasons.

One is that doing it while avoiding second system syndrome takes teams with a huge amount of both expertise and discipline. That includes the discipline to be ruthless about exterminating complexity and saying no. That’s institutionally hard.

The second is that there isn’t strong demand. What we have is good enough for what most of the market wants, and right now all the demand for new architecture work is in the GPU/NPU/TPU space for AI. Nobody is interested in messing with the foundation when all the action is there. The CPU in that world is just a job manager for the AI tensor math machine.

Quantum computing will be similar. QC will be controlled by conventional machines, making the latter boring.

We may be past the window where rethinking architectural choices is possible. If you told me we still had Unix in 2000 years I would consider it plausible.

➕ show 5 replies

minraws • last Sunday at 9:04 PM

This like saying generic systems are bad because you and a hacker both can make sane assumptions about it, thus even if more performant/usable it's also more vulnerable hence shouldn't be used.

I don't understand this.

I have seen bad takes but this one takes the cake. Brilliant start to 2026...

brcmthrowaway • last Sunday at 7:52 PM

How does object store hardware work? Doesnt it still require a cache?

Any papers on modern object store archiectures (is that the right terminology?)

gjvc • yesterday at 12:58 AM

> Because the attempts at segmented or object-oriented address spaces failed miserably.

where, what, evidence of this please...

bschmidt25014 • last Sunday at 9:25 PM

[dead]

userbinator • last Sunday at 9:01 PM

More advocacy propaganda for corporate authoritarianism under the guise of "safety". Locked-down systems like he describes fortunately died out long ago, but they are making a vicious comeback and will take over unless we fight it as much as we can.

➕ show 1 reply

alt Hacker News

Linear Address Spaces: Unsafe at any speed (2022)

Comments