It is somewhat interesting that the most widely used "big" OS that doesn't use fork, i.e. Windows, has dog slow process creation...
I agree that there should be non-fork primitives, I'm just not that sure that performance is the best argument.
Because that OS best practices is to use threads.
Traditionally Windows applications that create processes all the time come from UNIX heritage.
Contrary to UNIX, Windows NT was designed with threads first mentality, from the get go.
While on UNIX they were added after fact, and to this day there are gotchas mixing posix threads with signals, fork and exec.
I suspect it's a long tail sort of thing; it mostly doesn't matter except when it really matters. It's interesting that the stated motivation for the patch is in the context of agentic tools spawning subcommands. There's some related prior art in this area where the payoffs could be much greater, like fuzzing: https://gts3.org/assets/papers/2017/xu:os-fuzz.pdf is an example. It would be very interesting to see this patch applied to e.g. AFL++
That's not the reason for the performance difference. Windows does have a fork primitive (ZwCreateProcess) and it's still slower than Linux's equivalent.
The problem with fork isn't really that it's slow. The problem is that if you want it to be not-slow, it locks you into a bunch of OS design decisions: you more or less need a memory subsystem where all writable pages are refcounted and copy-on-write when the refcount is bigger than 1, and you need overcommit.
Now these decisions aren't objectively bad, but they have significant trade-offs and it's probably not a good idea that they're forced simply because we use fork()+exec() for process creation.