The designers of NFS chose to make a distributed system emulate a highly consistent and available system (a hard drive), which was (and is) a reasonable tradeoff. It didn't require every existing tool, such as ls, to deal with things like the server rebooting while listing a directory. (The original NFS protocol is stateless, so clients can survive server reboots.) What does vi do when the server hosting the file you're editing stop responding? None of these tools have that kind of error handling.
I don't know how io_uring solves this - does it return an error if the underlying NFS call times out? How long do you wait for a response before giving up and returning an error?
> The designers of NFS chose to make a distributed system emulate a highly consistent and available system (a hard drive),
> The original NFS protocol is stateless,
The protocol is, but the underlying disk isn’t.
- A stateless emulation doesn’t know of the concept of “open file”, so “open for exclusive access” isn’t possible, and ways to emulate that were bolted on.
- In a stateless system, you cannot open a scratch file for writing, delete it, and continue using it, in the expectation that it will be deleted when you’re done using it (Th Unix Hater’s handbook (https://web.mit.edu/~simsong/www/ugh.pdf) says there are hacks inside NFS to make this work, but that makes the protocol stateful)
> It didn't require every existing tool, such as ls, to deal with things like the server rebooting while listing a directory
But see above for an example where every tool that wants to do record locking or get exclusive access to a file has to know whether it’s writing to a NFS disk to figure out how to do that.
> The designers of NFS chose to make a distributed system emulate a highly consistent and available system (a hard drive), which was (and is) a reasonable tradeoff
I don't agree that it was a reasonable tradeoff. Making an unreliable system emulate a reliable one is the very thing I find to be a bad idea. I don't think this is unique to NFS, it applies to any network filesystem you try to present as if it's a local one.
> What does vi do when the server hosting the file you're editing stop responding? None of these tools have that kind of error handling.
That's exactly why I don't think it's a good idea to just pretend a network connection is actually a local disk. Because tools aren't set up to handle issues with it being down.
Contrast it with approaches where the client is aware of the network connection (like HTTP/GRPC/etc)... the client can decide for itself how long it should retry failed requests, whether it should bubble up failures to the caller, or work "offline" until it gets an opportunity to resync, etc. With NFS the syscall just hangs forever by default.
Distributed systems are hard, and NFS (and other similar network filesystems) just pretend it isn't hard at all, which is great until something goes wrong, and then the abstraction leaks.
(Also I didn't say io_uring solves this, but I'm curious as to whether its performance would be any better than blocking calls.)