It seems like it would be possible to implement this in userspace using shared memory to store the data structures and using just one eventfd per thread to park/unpark (or a futex if not waiting for anything else), which should be fully correct and have similar or faster performance, at the cost of not being secure or robust against process crashes (which isn't a big problem for more Wine usage).
It seems that neither esync or fsync do this though - why?
Claude thinks that "nobody was motivated enough to write and debug the complex shared-memory waiter-list logic when simpler (if less correct) approaches worked for 95% of games, and when correctness finally mattered enough, the kernel was the more natural place to put it". Is that true?
I don't know the technical details, but the kernel docs say "It exists because implementation in user-space, using existing tools, cannot match Windows performance while offering accurate semantics." https://docs.kernel.org/userspace-api/ntsync.html
> It seems like it would be possible to implement this in userspace using shared memory
It is not. Perhaps this should be possible, but Linux doesn't provide userspace facilities that would be necessary to do this entirely in userspace.
This is not merely an API shim that allows Windows binary object to dynamically link and run. It’s an effort to recreate the behavior of NT kernel synchronization and waiting semantics. To do this, Linux kernel synchronization primitives and scheduler API must be used. You can read the code[1] and observe that this is a compatibility adapter that relies heavily on Linux kernel primitives and their coordination with the kernel scheduler. No approach using purely user space synchronization primitives can do this both efficiently and accurately.
[1] https://github.com/torvalds/linux/blob/master/drivers/misc/n...