Good post! Stuff I didn't know x64 has. Sadly doesn't answer the "how many registers are behind rax" question I was hoping for, I'd love to know how many outstanding writes one can have to the various architectural registers before the renaming machinery runs out and things stall. Not really for immediate application to life, just a missing part of my mental cost model for x64.
Conservatively though, another answer could be when not considering subset registers as distinct:
16 GP
2 state (flags + IP)
6 seg
4 TRs
11 control
32 ZMM0-31 (repurposes 8 FPU GP regs)
1 MXCSR
6 FPU state
28 important MSRs
7 bounds
6 debug
8 masks
8 CET
10 FRED
=========
145 total
And don't forget another 10-20 for the local APIC.
"The answer" depends upon the purpose and a specific set of optional extensions. Function call, task switching between processes in an OS, and emulation virtual machine process state have different requirements and expectations. YMMV.
Here's a good list for reference: https://sandpile.org/x86/initial.htm
x86-64 ISA general-purpose register containers: low-er 8 to 16 bits of the 64 bit GPR.
Intel's next gen will add 16 more general purpose registers. Can't wait for the benchmarks.
Even though this post is from 2020, it’s still a classic reference. It’s especially relevant now to revisit this baseline considering Intel’s APX which aims to double the GPRs to 32. Understanding how we got here is key to appreciating where the architecture is headed next.
Don't forget x86_64 like ARM is IP-locked, RISC-V is not.
This is how many registers the ISA exposes, but not the number of registers actually in the CPU. Typical CPUs have hundreds of registers. For example, Zen 4 's integer register file has 224 registers, and the FP/vector register file has 192 registers (per Wikipedia). This is useful to know because it can effect behavior. E.g. I've seen results where doing a register allocation pass with a large number of registers, followed by a pass with the number of registers exposed in the ISA, leads to better performance.