I'm a little surprised that this bug wasn't fixed in the assembler as a special case for immediate adds to RSP. If the patch was to the compiler only, other instances of the bug could be lurking out there in aarch64 assembly code.
Would that be wise? The implemented solution uses a temporary register to hold the full value being added to rsp.
I don't know enough about how people use the go assembler, but I imagine it would be very surprising if `add $imm, rsp, rsp` clobbered an unrelated register when `$imm` is large enough. Especially since what's clobbered is the designated "temporary register", which I imagine is used all the time in handwritten go assembly.
Is that possible? I think you would have [1] to use a register to build up the immediate value. The assembler cannot/should not default to one, so I think the best one could do is having another macro for ADD that takes that helper register as an argument. That wouldn’t fix other instances in the AArch64 assembly code.
[1] I’m not familiar with AMD64, but maybe, you could use a thread local (edit: wouldn’t work with M:N threads. You’d need a coroutine-local. That would tie the assembler to golang, and thus would, even on that alone, be a very bad idea) or reserve space in the stack frame for it, too, but I don’t see those as realistic options