logoalt Hacker News

habermantoday at 7:13 AM0 repliesview on HN

More concretely, I think the magic lies in these two properties:

1. Conservation of mass: the amount of C code you put in will be pretty close to the amount of machine code you get out. Aside from the preprocessor, which is very obviously expanding macros, there are almost no features of C that will take a small amount of code and expand it to a large amount of output. This makes some things annoyingly verbose to code in C (eg. string manipulation), but that annoyance is reflecting a true fact of machine code, which is that it cannot handle strings very easily.

2. Conservation of energy: the only work that will be performed is the code that you put into your program. There is no "supervisor" performing work on the side (garbage collection, stack checking, context switching), on your behalf. From a practical perspective, this means that the machine code produced by a C compiler is standalone, and can be called from any runtime without needing a special environment to be set up. This is what makes C such a good language for implementing garbage collection, stack checking, context switching, etc.

There are some exceptions to both of these principles. Auto-vectorizing compilers can produce large amounts of output from small amounts of input. Some C compilers do support stack checking (eg. `-fstack-check`). Some implementations of C will perform garbage collection (eg. Boehm, Fil-C). For dynamically linked executables, the PLT stubs will perform hash table lookups the first time you call a function. The point is that C makes it very possible to avoid all of these things, which has made it a great technology for programming close to the machine.

Some languages excel at one but not the other. Byte-code oriented languages generally do well at (1): for example, Java .class files are usually pretty lean, as the byte-code semantics are pretty close to the Java langauge. Go is also pretty good at (1). Languages like C++ or Rust are generally good at (2), but have much larger binaries on average than C thanks to generics, exceptions/panics, and other features. C is one of the few languages I've seen that does both (1) and (2) well.