logoalt Hacker News

thomashabets2today at 9:08 AM1 replyview on HN

Author here.

> -Acceptance: "Just dont write UB."

The point of my article is that this is not possible. This cannot be our end state, as long as humans are the ones writing the code. No human can avoid writing UB in C/C++.


Replies

jarttoday at 10:23 AM

It's honestly not that difficult to be rigorous. The things you mentioned in the blog post are pretty obvious forms of degenerate practices once you get used to seeing them. The best way to make your argument would be to bring up pointer overflow being ub. What's great about undefined behavior is that the C language doesn't require you to care. You can play fast and loose as much as you want. You can even use implicit types and yolo your app, writing C that more closely resembles JavaScript, just like how traditional k&r c devs did back in the day under an ilp32 model. Then you add the rigor later if you care about it. For most stuff, like an experiment, we obviously don't care, but when I do, I can usually one shot a file without any UB (which I check by reading the assembly output after building it with UBSAN) except there's just one thing that I usually can't eliminate, which is the compiler generating code that checks for pointer overflow. Because that's just such a ridiculous concept on modern machines which have a 56 bit address space. Maybe it mattered when coding for platforms like i8086. I've seen almost no code that cares about this. I have to sometimes, in my C library. It's important that functions like memchr() for example don't say `for (char *p = data, *e = data + size; p<e; ...` and instead say `for (size_t i = 0; i < n; ++i) ...data[i]...`. But these are just the skills you get with mastery, which is what makes it fun. Oh speaking of which, another fun thing everyone misses is the pitfalls of vectorization. You have to venture off into UB land in order to get better performance. But readahead can get you into trouble if you're trying to scan something like a string that's at the end of a memory page, where the subsequent page isn't mapped. My other favorite thing is designing code in such a way that the stack frame of any given function never exceeds 4096 bytes, and using alloca in a bounded way that pokes pages if it must be exceeded. If you want to have a fun time experiencing why the trickiness of UB rules are the way they are, try writing your own malloc() function that uses shorts and having it be on the stack, so you can have dynamic memory in a signal handler.