I've never heard of that rule (though tbh I'm not allocating > 64KB of stack when I'm in assembly) and it seems Google hasn't either. While I'm sure it makes sense, I don't think I've ever seen that be enforced. At least in C/C++. Maybe it makes more sense for these stack inspecting garbage collectors but I've also heard of ones that just scan the stack without unwinding anything. I did a test asking Google's AI to generate a complicated C function, put it in godbolt, and there's plenty of push push push push ..... Pop Pop Pop Pop going on
> While I'm sure [bumping the stack pointer atomically] makes sense, I don't think I've ever seen that be enforced. At least in C/C++.
That’s because the C ABI supports unwinding with a fairly expressive set of tools for describing stack-pointer state on a per-instruction level. Even the simpler Microsoft ABI essentially uses bytecode for that[1]; and on the more complicated Itanium ABI, you get DWARF CFI instructions, which make the correct way to preserve a(n x86) register in the function prologue look like
push rbx
.cfi_adjust_cfa_offset 8
.cfi_rel_offset rbx, 8
which are impossible to miss when reading compiler-generated assembly because of the sheer amount of annoying noise they create.The Go authors decided to sidestep all of this complexity, which is understandable to a degree, but apparently they did not think through all the ramifications of doing so.
[1] https://learn.microsoft.com/en-us/cpp/build/exception-handli...
Did you compile with optimisations? I think GCC will do a bunch of activity on the stack with -O0, but it'll generally coalesce everything into one push/pop per function with optimisations (not because of any rule, but just because it's faster). alloca and other dynamic stack allocation may break this, but normal variables should in pretty much all just get turned into one block on the stack (with appropriate re-use of space if variable lifetimes don't overlap)
You need to look at non-x86 architectures. It was common years ago on MIPS.
* https://jdebp.uk/FGA/function-perilogues.html#StandardMIPS
I wrote up the x86 equivalent of doing just two read-modify-write operations on the stack pointer over 16 years ago.
* https://jdebp.uk/FGA/function-perilogues.html#Standardx86