I'm not sure how privilege escalation would be an issue since you'd never escalate privilege in the first place (I'm assuming you're talking about CPU ring privileges and not OS privileges). You'd just enqueue into the shared kernel/user space ring buffer your operations and the kernel would pick them up on its side, but you'd never jump between rings.
Such a design may require at least one processor dedicated to running the kernel at all times, so it might not work on a single processor architecture. However, single processor architectures might be supportable by having the "kernel process" go to sleep by arming a timer and the timer interrupt is the only one that's specially mapped so it can modify the page table to resume the kernel (for handling all the ring buffers + scheduling). As you note, there's some reserved address space but it's a trivial amount just to be able to resume running the kernel. I don't think it has anything to do with monolithic vs microkernels.
True, you don't have to go full microkernel just to have messages passed though a buffer. However, if the buffer is shared by all processes, it does need to have some protection. I guess you could assign one buffer per process (which might end up using a lot of physical RAM), and then just crash the process if it corrupts its own buffer. The bigger issue with this approach might be adapting to asynchrony though.
I have for a while wondered why we don't have "security core"s that are are really slow, but don't have caches or speculative execution so you can run security critical code without having to worry about CPU timing attacks