I'm curious, what are you working on that requires writing inline assembly?
Might be an interpreter or an emulator. That’s where you often want to preserve registers or flags and have jump tables.
This is one of the remaining cases where the current compilers optimize rather poorly: when you have a tight loop around a huge switch-statement, with each case-statement performing a very small operation on common data.
In that case, a human writing assembler can often beat a compiler with a huge margin.
I'm not them but whenever I've used it it's been for arch specific features like adding a debug breakpoint, synchronization, using system registers, etc.
Never for performance. If I wanted to hand optimise code I'd be more likely to use SIMD intrinsics, play with C until the compiler does the right thing, or write the entire function in a separate asm file for better highlighting and easier handing of state at ABI boundary rather than mid-function like the carry flags mentioned above.