logoalt Hacker News

inkyototoday at 3:51 AM0 repliesview on HN

> Idle threads do increase the amount of committed stack.

I am not clear on why the stack of an idlying thread would continue to grow. If a previously processed unit of work resulted in large amounts of memory pages backing the thread stack getting committed, then yes, it is not common to unmap the no longer required pages. It is a deliberate trade-off: automatic stack shrink is difficult to do safely and cheaply.

Idle does not actually make stacks grow, put simply.

> The nt kernel actually works similarly to Linux w.r.t. processes and threads.

Respectfully, this is slightly more that entirely incorrect.

Since Linux uses a single kernel abstraction («task_struct») for both processes and threads, it has one schedulable kernel object – «task_struct» – for both what user space calls a process and what user space calls a thread. «Process» is essentially a thread group leader plus a bundle of shared resources. Linux underwent the consolidation of abstractions in a quest to support POSIX threads at the kernel level decades ago.

Since fork(2) is, in fact, clone(2) with a bunch of flags, what you get depends on clone flags: sharing VM, files, FS context, signal handlers, and whether you are in the same thread group (CLONE_THREAD) and that creates a new thread group with its own memory management (but populated using copy-on-write), separate signal disposition context, etc.

Windows has two different kernel objects: a process (EPROCESS) and a thread (ETHREAD/KTHREAD). Threads are the schedulable entities; a process is the container for address space, handle table, security token, job membership, accounting, etc. They are tightly coupled, but not «the same thing».

On Windows, «CreateProcess» is heavier than Linux fork for structural reasons: it builds a new process object, maps an image section, creates the initial thread, sets up the PEB/TEB, initialises the loader path, environment, mitigations, etc. A chunk of that work is kernel-side and a chunk is user-mode (notably the loader and, for Win32, subsystem involvement). Blaming only «userspace» is wrong.

Defender (and third-party AV/EDR) can measurably slow process creation because it tends to inspect images, scripts, and memory patterns around process start, not because of deficiences of the kernel and system calls design.