Maybe tangentially related but I always think it's silly that every linux process has the same libgcc_so.so.1 loaded into memory for each process even though the raw binary for the library is exactly the same so you end up with like 800 copies of libgcc_so.so.1 in memory.
I mean maybe this has been optimized for already and I don't know what I'm talking about but maybe someone with more knowledge about the kernel knows? Is this something we simply can't optimize for because of security implications?
Typically libgcc_so.so is loaded by the linker, which uses an mmap call to map the binary into the address space.
> The kernel keeps track of which file is mapped where, and can detect when a request is made to map an already mapped file again, avoiding physical memory allocation if possible.
Relevant stack overflow answer: https://stackoverflow.com/questions/61950951/linux-shared-li...
In Linux, when a shared lib is loaded by multiple processes, its loaded once and not duplicated in ram. Only if a memory page is modified by the process will the memory be duplicated. (Hope I have explained that correctly)
Those mappings by default all go to the same shared memory.
Unices have been sharing executable memory between processes longer than there's been mmap for user space to do the same thing themselves. I remember seeing it in the 2BSD kernel for instance.
Eh? Aren't shared libraries actually shared in memory?
I have a rule for myself. If I think something is silly or stupid, I assume I don't understand it. I usually find I do not understand it, and it no longer seems silly when I do understand it.
In this case too, you think it is silly because you don't understand it. Your assumptions are wrong, making it seem silly.
Shared libraries (and mmapped files in general) are deduplicated; it's nowhere near as bad as you think. The kernel loads a .so into memory once and then maps that memory into every process that mmaps it.
Editing to add: this deduplication is one of the greatest upsides to dynamic linking. Common libs like libgcc and libc only have to exist in memory once and can stay in CPU caches, whereas if they were statically linked into every binary, each binary would have a copy of that library that wouldn't be shared with anything else and you'd waste a lot of memory.