The speed of C is still largely intrinsic to the language.
The primatives are directly related to the actual silicon. A function call is actually going to turn into a call instruction (or get inlined). The order of bytes in your struct are how they exist in memory, etc. A pointer being dereferenced is a load/store.
The converse holds as well. Interpreted languages are slow because this association with the hardware isn't the case.
When you have a poopy compiler that does lots of register shuffling then you loose this association.
Specifically the constant spilling with those specific functions functions that were the 1000x slowdown, makes the C code look a lot more like Python code (where every variable is several dereference away).
Right - maybe we're saying the same thing. C is naturally amenable to being blazing fast, but if you compile it without trying to be efficient (not trying to be inefficient, just do the simplest, naive thing) it's still slow - by 1-1.5 order of magnitude.