The provenance memory model for C

208 points • by HexDecOctBin • yesterday at 9:25 AM • 111 comments • view on HN

Comments

gavinray • yesterday at 12:25 PM

Also of interest to folks looking at this might be TySan, the recently-merged LLVM Type-Based Aliasing sanitizer:

https://clang.llvm.org/docs/TypeSanitizer.html

https://www.phoronix.com/news/LLVM-Merge-TySan-Type-Sanitize...

➕ show 1 reply

lioeters • yesterday at 12:23 PM

Looks like a code block didn't get closed properly, before this phrase:

> the functions `recip` and `recip⁺` and not equivalent

Several paragraphs after this got swallowed by the code block.

Edit: Oh, I didn't realize the article is by the author of the book, Modern C. I've seen it recommended in many places.

> The C23 edition of Modern C is now available for free download from https://hal.inria.fr/hal-02383654

➕ show 3 replies

tialaramex • yesterday at 12:20 PM

Presumably this was converted from markdown or similar and the conversion partly failed or the input was broken.

From the PVI section onward it seems to recover, but if the author sees this please fix and re-convert your post.

[Edited, nope, there are more errors further in the text, this needed proper proofreading before it was posted, I can somewhat struggle through because I already know this topic but if this was intended to introduce newcomers it's probably very confusing]

➕ show 1 reply

nikic • yesterday at 8:45 PM

At least at a skim, what this specifies for exposure/synthesis for reads/writes of the object representation is concerning. One of the consequences is that dead integer loads cannot be eliminated, as they may have an exposure side effect. I guess C might be able to get away with it due to the interaction with strict aliasing rules. Still quite surprised that they are going against consensus here (and reduces the likelihood that these semantics will get adopted by implementers).

➕ show 3 replies

zombot • yesterday at 12:17 PM

Does C allow Unicode identifiers now, or is that pseudo code? The code snippets also contain `&`, so something definitely went wrong with the transcoding to HTML.

➕ show 4 replies

hinkley • yesterday at 9:04 PM

> Unfortunately no C compiler can do this optimization automatically:

> The functions recip and recip⁺ and not equivalent.

This is one of those examples of how optimizing code can improve legibility, robustness, or both.

The first implementation allows for side effects to change the outcome of the function. But the problem is that the code is not written expecting someone to modify the values in the middle of the loop. It's incorrect behavior, and you're paying a performance penalty for it to boot.

Functional Core code tends not to have this problem, in that we pass in a snapshot of data and it either gets an answer or an error.

I've seen too much code that checks 3 times if a user is either still logged in or has permission to do a task, and not one of them was set up to deal with one answer for the first call and a different one for any of the subsequent ones. They just go into undefined behavior.

smcameron • yesterday at 4:04 PM

Ugh. Are unicode variable names allowed in C now? That's horrific.

➕ show 5 replies

gustedt • yesterday at 7:21 PM

Randomly introduced translation errors from markdown to wordpress-internal should be fixed, now. Sorry for the incovenience!

➕ show 1 reply

cryptonector • today at 3:47 AM

:thank you:

This is great. I wonder what u/pizlonator thinks of it.

RossBencina • yesterday at 10:02 PM

After reading the fine article I'm left wondering what if you implement your own heterogeneous allocation scheme on top of malloc? (e.g. TLSF) In this case all of your objects will belong to the same malloced storage region, and you will compute object offsets using raw pointers, but I'd expect provenance to potentially treat each returned object to behave as if it were allocated from a separate disjoint storage.

I guess my question is: does this provenance model allow for recursive nesting of allocators with a separate notion of "storage" at each level?

➕ show 1 reply

b0a04gl • yesterday at 3:02 PM

provenance model basically turns memory back into a typed value. finally malloc wont just be a dumb number generator, it'll act more like a capability issuer. and access is not 'is this address in range' anymore, but “does this pointer have valid provenance”. way more deterministic, decouples gcc -wall

➕ show 1 reply

eqvinox • yesterday at 4:03 PM

Using the "register" storage class feels really alien for C code written in 2025…

➕ show 1 reply

dsp_person • yesterday at 6:26 PM

    if ((Π⁻ &lt; Π) &amp;&amp; (Π &lt; Π⁺)) {

I spent way too long trying to figure this out as C code instead of

    if ((Π⁻ < Π) && (Π < Π⁺)) {

nixpulvis • today at 1:37 AM

As a bit of an aside, the example XOR doubly linked list example given here is super cool.

jvanderbot • yesterday at 12:55 PM

I love Rust, but I miss C. If C can be updated to make it generally socially acceptable for new projects, I'd happily go back for some decent subset of things I do. However, there's a lot of anxiety and even angst around using C in production code.

➕ show 5 replies

jaisio • yesterday at 9:39 PM

The root cause of all this is that C programs are not much more than glorified assembly programs. Any effort to retrofit higher level reasoning will always be defeated somebody doing some dirty pointer tricks. This can only be solved by more abstract ways to express programs which necessarily restricts the bare metal dirty things one can do. But what you gain is that the compiler will easily be able to do lots of things which a C compiler can't do or only with a lot of headache. The kind of stuff this article is about is really trying to solve the wrong problem IMO.

Joker_vD • yesterday at 5:01 PM

> Here the term "same representation and alignment" covers for example the possibility to look at [...] one would be a structure and the other would be another structure that sits at the beginning of the first.

Does it? It is quite simple for a struct A that has struct B as its first member to have radically different alignment:

    struct B { char x; };

    struct A { struct B b; long long y; };

Also, accidentally coinciding pointers are nothing "rare" because all objects are allowed to be treated as 1-element arrays: so any pointer to an e.g. struct field is also a pointer one-past the previous field of this struct; also, malloc() allocations easily may produce "touching" objects. So thanks for allowing implementations to not have padding between almost every two objects, I guess.

➕ show 1 reply

briandw • yesterday at 1:31 PM

The code blocks are very difficult to read on this page. I had ChatGPT O3 rewrite this in a more accessible format. https://chatgpt.com/share/68629096-0624-8005-846f-7c0d655061...

➕ show 1 reply

cenobyte • yesterday at 3:59 PM

Please fix the code in your post.

alt Hacker News

The provenance memory model for C

Comments