Windows developer here. After reading this post, my gut instinct is that this is due to something called 'segment heap'.
A bit of backstory: there are two, totally independent implementations behind the Windows heap allocation APIs (i.e. the implementation code behind RtlHeapAlloc and RtlHeapFree, which are called by malloc/free). The older of the two, developed uring the Dave Cutler era, is known as the "NT heap". The newer implementation, developed in the 2010s, is known as "segment heap". This is all documented online if anyone wants to read more. When development on segment heap was completed, it was known to be superior to the NT heap in many ways. In particular, it was more efficient in terms of memory footprint, due to lower fragmentation-related waste. Segment heap was smarter about reusing small allocations slots that were recently free'd. But, as ever, Windows was very serious about legacy app compat. Joel Spolsky calls this the 'Raymond Chen camp'. So, they didn't want to turn segment heap on universally. It was known that a small portion of legacy software would misbehave and do things like, rely on doing a bit of use-after-free as a treat. Or worse, it took dependencies on casting addresses to internal NT heap data structures. So, the decision at the time was to make segment heap the default for packaged executables. At that time, Windows Phone still existed, and Microsoft was pushing super hard on the Universal platform being the new, recommended way to make apps on Windows. So they thought we'd see a gradual transition from unpackaged executables to packaged, and thus, a gradual transition from NT heap to segment heap. The dream of UWP died, and the Windows framework landscape is more fragmented than ever. Most important software on Windows is still unpackaged, and most of it runs on x64.
Why does this matter? Because segment heap is also enabled by default on arm. Same logic as the packaged vs unpackaged decision. Arm64 binaries on Windows are guaranteed not to be ancient, unmaintained legacy code. Arm64 windows devices have been a big success, and users widely report that they feel more responsive than x64 devices.
A not insignificant part of why Windows feels better on arm is because segment heap is enabled by default on arm.
I'd be interested to see how this test turns out if you force segment heap on x64. You can do it on a per-executable basis via creating a DWORD value named FrontEndHeapDebugOptions under HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\<myExeName>.exe, and giving it a value of 8.
You can turn it on globally for all processes by creating a DWORD value named "Enabled" under HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Segment Heap, and giving it a value of 3. I do this on my dev machine and have encountered zero problems. The memory footprint savings are pretty crazy. About 15% in my testing.
> You can turn it on globally for all processes by creating a DWORD value named "Enabled" under HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Segment Heap, and giving it a value of 3
I had previously seen this described as 0 vs non-zero. Since you have some inside experience :), anything special about 3 instead? What about 2? How would I find these value meanings out on my own (if that's even possible)?
Thanks!
This feels like it deserves to live somewhere on a blog, not as a comment on some forum. This is really interesting thanks for sharing.
Wonderful breakdown. I love reading this kind of thing. thank you
For those interested, you can opt-in to this behavior via the application manifest for your own executables: set heapType to SegmentHeap https://learn.microsoft.com/en-us/windows/win32/sbscs/applic...