logoalt Hacker News

muvlontoday at 8:33 AM15 repliesview on HN

Yes there is tons of surprising and weird UB in C, but this article doesn't do a great job of showcasing it. It barely scratches the surface.

Here's a way weirder example:

  volatile int x = 5;
  printf("%d in hex is 0x%x.\n", x, x);
This is totally fine if x is just an int, but the volatile makes it UB. Why? 5.1.2.4.1 says any volatile access - including just reading it - is a side effect. 6.5.1.2 says that unsequenced side effects on the same scalar object (in this case, x) are UB. 6.5.3.3.8 tells us that the evaluations of function arguments are indeterminately sequenced w.r.t. each other.

So in common parlance, a "data race" is any concurrent accesses to the same object from different threads, at least one of which is a write. In C, we can have a data race on a single thread and without any writes!


Replies

thomashabets2today at 10:56 AM

Author here.

> It barely scratches the surface.

I agree. The point of the post is not to enumerate and explain the implications of all 283 uses of the word "undefined" in the standard. Nor enumerate all the things that are undefined by omission.

The point of the post is to say it's not possible to avoid them. Or at least, no human since the invention of C in 1972 has.

And if it's not succeeded for 54 years, "try harder", or "just never make a mistake", is at least not the solution.

The (one!) exploitable flaw found by Mythos in OpenBSD was an impressive endorsement of the OpenBSD developers, and yet as the post says, I pointed it at the simplest of their code and found a heap of UB.

Now, is it exploitable that `find` also reads the uninitialized auto variable `status` (UB) from a `waitpid(&status)` before checking if `waitpid()` returned error? (not reported) I can't imagine an architecture or compiler where it would be, no.

FTA:

> The following is not an attempt at enumerating all the UB in the world. It’s merely making the case that UB is everywhere, and if nobody can do it right, how is it even fair to blame the programmer? My point is that ALL nontrivial C and C++ code has UB.

show 2 replies
tialaramextoday at 10:53 AM

Volatile is a type system hack. They should have done a more principled fix, and certainly modern languages should not act as though "C did it" makes it a good idea.

The reason for the hack is that very early C compilers just always spill, so you can write MMIO driver code by setting a pointer to point at the MMIO hardware and it actually works because every time you change x the CPU instruction performs a memory write.

Once C compilers got some basic optimisations that obvious "clever" trick stops working because the compiler can see that we're just modifying x over, and over and over, and so it doesn't spill x from a register and the driver doesn't work properly. C's "volatile" keyword is a hack saying "OK compiler, forget that optimisation" which was presumably a few minutes work to implement, whereas the correct fix, providing MMIO intrinsics in the associated library, was a lot of work.

Why should you want intrinsics here? Intrinsics let you actually spell out what's possible and what isn't. On some targets we can actually do a 1-byte 2-byte and 4-byte write, those are distinct operations and the hardware knows, so e.g. maybe some device expects a 4-byte RGBA write and so if you emit four 1-byte writes that's very confusing and maybe it doesn't work, don't do that. On some targets bit-level writes are available, you can say OK, MMIO write to bit 4 of address 0x1234 and it will write a single bit. If you only have volatile there's no way to know what happens or what it means.

show 2 replies
HarHarVeryFunnytoday at 12:14 PM

> In C, we can have a data race on a single thread and without any writes!

Well, sure, that's what volatile means - that the value may be changed by something else. If it's a global variable then the something else might be an interrupt or signal handler, not just another thread. If it's a pointer to something (i.e. read from a specific address) then that could be a hardware device register who's value is changing.

The concept of a volatile variable isn't the problem - any language that is going to support writing interrupt routines and memory mapped I/O needs to have some way of telling the compiler "don't optimize this out" since reading from the same hardware device register twice isn't like reading from the same memory location twice.

I think the problem here is more that not all of the interactions between language features and restrictions have been fully thought out. It's pretty stupid to be able to explicity tell the language "this value can change at any time", and for it to still consider certain uses of that value as UB since it can change at any time! There should have been a carve out in the "unsequenced side effect" definitions for volatile variables.

show 1 reply
prontoday at 12:28 PM

> In C, we can have a data race on a single thread and without any writes!

You need to distinguish between a UB and a race, and I think that's something that discussions of UB miss. Take any C program and compile it. Then disassemble it. You end up with an Assembly program that doesn't have any UB, because Assembly doesn't have UB.

UB is a property of a source program, not the executable. It means that the spec for the language in which the source is written doesn't assign it any meaning. But the executable that's the result of compiling the program does have a meaning assigned to it by the machine's spec, as machine code doesn't have UB.

A race is a property of the behaviour of a program. So it's true to say that your C program has UB, but the executable won't actually have a race. Of course, a C compiler can compile a program with UB in any way it likes so it's possible it will introduce a race, but if it chooses to compile the program in a way that doesn't introduces another thread, then there won't be a race.

show 2 replies
simonasktoday at 8:39 AM

I think the article's point is that you don't actually have to get weird at all to run into UB.

Lots of people mistakenly think that C and C++ are "really flexible" because they let you do "what you want". The truth of the matter is that almost every fancy, powerful thing you think you can do is an absolute minefield of UB.

show 3 replies
mananaysiempretoday at 10:13 AM

And it makes sense as long as you allow the concept of unsequenced operations at all (admittedly it’s somewhat rare; e.g. in Scheme such things are defined to still occur in sequence, but which specific sequence is unspecified and potentially different each time). The “volatile” annotation marks your variable as being an MMIO register or something of that nature, something that could change at any point for reasons outside of the compiler’s control. Naturally, this means all of the hazards of concurrent modification are potentially there.

That said, your “common parlance” definition of “data race” is not the definition used by the C standard, so your last sentence is at best misleading in a discussion of standard C.

> The execution of a program contains a data race if it contains two conflicting actions in different threads, at least one of which is not atomic, and neither happens before the other. Any such data race results in undefined behavior.

(Here “conflicting” and “happens before” are defined in the preceding text.)

show 1 reply
bertitoday at 10:29 AM

Reading a register from a microcontroller peripheral may well reset it as an example of a possible side-effect here, and that's exactly the kind of thing you use volatile for.

rocketrascaltoday at 1:12 PM

Are you sure?

>unsequenced side effects on the same scalar object are UB

>6.5.3.3.8 tells us that the evaluations of function arguments are indeterminately sequenced w.r.t. each other.

Read 5.1.2.4.3:

"If A is not sequenced before or after B, then A and B are unsequenced."

"Evaluations A and B are indeterminately sequenced when A is sequenced either before or after B, but it is unspecified which."

With a footnote saying this:

"9)The executions of unsequenced evaluations can interleave. Indeterminately sequenced evaluations cannot interleave, but can be executed in any order."

I.e the standard makes a distinction between "unsequenced" and "indeterminately sequenced". And with no mention of side effects on "indeterminately sequenced" being UB it leads me to conclude that your example is not UB.

zahlmantoday at 1:48 PM

> Here's a way weirder example:

Well, yes; but when the C standard authors wrote like this, they surely had in mind "the reads could be in either order, therefore the output could display the polled values in either order". Not C++ nasal demons.

And yeah, being able to say "reading is a side effect" is important when for example you interact with certain memory-mapped devices.

sethevtoday at 9:34 AM

Yes, there is a data race there. The value of a volatile can be changed by something outside the current thread. That’s what volatile means and why it exists.

Edit: thread=thread of execution. I’m not making a point about thread safety within a program.

show 3 replies
RobotToastertoday at 10:52 AM

With volatile it could be changed by an interrupt service routine between reads, so it makes sense.

rramadasstoday at 11:02 AM

This has got nothing to do with data races etc. but everything to do with "Sequence Points and Single Update Rule" which is well described in C language specification.

See my comment here - https://news.ycombinator.com/item?id=48205760

imtringuedtoday at 11:40 AM

Memory mapped IO sends a read request to a peripheral which is allowed have side effects in the background and return two different values upon a read. You can think of it as a synchronous RPC request.

The lack of argument sequencing feels utterly petty however.

netrikaretoday at 1:58 PM

[dead]