I recently started using Microsoft's mimalloc (via an LD_PRELOAD) to better use huge (1 GB) pag...

bmenrigh • today at 6:40 PM • 10 replies • view on HN

I recently started using Microsoft's mimalloc (via an LD_PRELOAD) to better use huge (1 GB) pages in a memory intensive program. The performance gains are significant (around 20%). It feels rather strange using an open source MS library for performance on my Linux system.

There needs to be more competition in the malloc space. Between various huge page sizes and transparent huge pages, there are a lot of gains to be had over what you get from a default GNU libc.

Replies

skavi • today at 7:45 PM

We evaluated a few allocators for some of our Linux apps and found (modern) tcmalloc to consistently win in time and space. Our applications are primarily written in Rust and the allocators were linked in statically (except for glibc). Unfortunately I didn't capture much context on the allocation patterns. I think in general the apps allocate and deallocate at a higher rate than most Rust apps (or more than I'd like at least).

Our results from July 2025:

rows are <allocator>: <RSS>, <time spent for allocator operations>

  app1:
  glibc: 215,580 KB, 133 ms
  mimalloc 2.1.7: 144,092 KB, 91 ms
  mimalloc 2.2.4: 173,240 KB, 280 ms
  tcmalloc: 138,496 KB, 96 ms
  jemalloc: 147,408 KB, 92 ms

  app2, bench1
  glibc: 1,165,000 KB, 1.4 s
  mimalloc 2.1.7: 1,072,000 KB, 5.1 s
  mimalloc 2.2.4:
  tcmalloc: 1,023,000 KB, 530 ms

  app2, bench2
  glibc: 1,190,224 KB, 1.5 s
  mimalloc 2.1.7: 1,128,328 KB, 5.3 s
  mimalloc 2.2.4: 1,657,600 KB, 3.7 s
  tcmalloc: 1,045,968 KB, 640 ms
  jemalloc: 1,210,000 KB, 1.1 s

  app3
  glibc: 284,616 KB, 440 ms
  mimalloc 2.1.7: 246,216 KB, 250 ms
  mimalloc 2.2.4: 325,184 KB, 290 ms
  tcmalloc: 178,688 KB, 200 ms
  jemalloc: 264,688 KB, 230 ms

tcmalloc was from github.com/google/tcmalloc/tree/24b3f29.

i don't recall which jemalloc was tested.

➕ show 3 replies

pjmlp • today at 6:54 PM

If you go into Dr Dobbs, The C/C++ User's Journal and BYTE digital archives, there will be ads of companies whose product was basically special cased memory allocator.

Even toolchains like Turbo Pascal for MS-DOS, had an API to customise the memory allocator.

The one size fits all was never a solution.

m463 • today at 8:01 PM

I remember in the early days of web services, using the apache portable runtime, specifically memory pools.

If you got a web request, you could allocate a memory pool for it, then you would do all your memory allocations from that pool. And when your web request ended - either cleanly or with a hundred different kinds of errors, you could just free the entire pool.

it was nice and made an impression on me.

I think the lowly malloc probably has lots of interesting ways of growing and changing.

➕ show 1 reply

adgjlsfhk1 • today at 6:47 PM

One of the best parts about GC languages is they tend to have much more efficient allocation/freeing because the cost is much more lumped together so it shows up better in a profile.

➕ show 3 replies

pocksuppet • today at 7:36 PM

In many cases you can also do better than using malloc e.g. if you know you need a huge page, map a huge page directly with mmap

Yes, if you want to use huge pages with arbitrary alloc/free, then use a third-party malloc. If your alloc/free patterns are not arbitrary, you can do even better. We treat malloc as a magic black box but it's actually not very good.

codexon • today at 7:01 PM

I've been using jemalloc for over 10 years and don't really see a need for it to be updated. It always holds up in benchmarks against any new flavor of the month malloc that comes out.

Last time I checked mimalloc which was admittedly a while ago, probably 5 years, it was noticebly worse and I saw a lot of people on their github issues agreeing with me so I just never looked at it again.

➕ show 2 replies

IshKebab • today at 7:13 PM

I feel like the real thing that needs to change is we need a more expressive allocation interface than just malloc/realloc. I'm sure that memory allocators could do a significantly better job if they had more information about what the program was intending to do.

➕ show 1 reply

anthk • today at 7:23 PM

I used mimalloc to run zenlisp under OpenBSD as it would clash with the paranoid malloc of base.

jeffbee • today at 7:12 PM

Just out of curiosity are you getting 1GB huge pages on Xeon or some other platform? I always thought this class of page is the hardest to exploit, considering that the machine only has, if I recall correctly, one TLB slot for those.

➕ show 1 reply

sylware • today at 6:46 PM

If there is so much performance difference among generic allocators, it means you need semantic optimized allocators (unless performance is actually not that much important in the end).

➕ show 2 replies

alt Hacker News

Replies