Sheesh. Can something this complicated ever truly be said to work?
You can limit yourself to the performance of a 1mhz 6502 with no OS if you don't like it. Even MSDos on a 8086 with 640K ram allows for things that require complexity of this type (not spin locks, but the tricks needed to make "terminate stay resident" work are evil in a similar way)
Yes, if you're careful. Actually careful, not pretend careful. Which is pretty normal in C and C++.
Isn't it the opposite? The complication is evidence of function. The simple code doesn't work.
OS kernel runqueue is using a spinlock to schedule everything. So it works. Should you ever use a spinlock in application code? No. Let the OS via the synchronization primitives in whatever language your app is in.