Unless you reserve on fork you're still over committing because after the fork writes to basically any page in either process will trigger memory commitment.
Thread stacks come up because reserving them completely ahead of time would incur large amounts of memory usage. Typically they start small and grow when you touch the guards. This is a form of overcommit. Even windows dynamically grows stacks like this
> […] because after the fork writes to basically any page in either process will trigger memory commitment.
This is largely not true for most processes. For a child process to start writing into its own data pages en masse, there has to exist a specific code path that causes such behaviour. Processes do not randomly modify their own data space – it is either a bug or a peculiar workload that causes it.
You would have a stronger case if you mentioned, e.g., the JVM, which has a high complexity garbage collector (rather, multiple types of garbage collectors – each with its own behaviour), but the JVM ameliorates the problem by attempting to lock in the entire heap size at startup or bailing if it fails to do so.
In most scenarios, forking a process has a negligible effect on the overall memory consumption in the system.
Also,
> Thread stacks come up because reserving them completely ahead of time would incur large amounts of memory usage. Typically they start small and grow when you touch the guards. This is a form of overcommit.
Ahead of the time memory reservation entails a page entry being allocated in the process’s page catalogue («logical» allocation), and the page «sits» dormant until it is accessed and causes a memory access fault – that is the moment when the physical allocation takes place. So copying reserved but not accessed yet pages has zero effect on the physical memory consumption of the process.
What actually happens to the thread stacks depends on the actual number of active threads. In modern designs, threads are consumed from thread pools that implement some sort of a run queue where the threads sit idle until they get assigned a unit of work. So if a thread is idle, it does not use its own stack thread and, consequently, there is no side effect on the child's COW address space.
Granted, if the child was copied with a large number of active threads, the impact will be very different.
> Even windows dynamically grows stacks like this
Windows employs a distinct process/thread design, making the UNIX concept of a process foreign. Threads are the primary construct in Windows and the kernel is highly optimised for thread management rather than processes. Cygwin has outlined significant challenges in supporting fork(2) semantics on Windows and has extensively documented the associated difficulties. However, I am veering off-topic.