logoalt Hacker News

A single-file C allocator with explicit heaps and tuning knobs

43 pointsby endukulast Wednesday at 5:49 PM23 commentsview on HN

Comments

endukulast Wednesday at 5:49 PM

I wrote this because I wanted more explicit control over heaps when building different subsystems in C. Standard options like jemalloc and mimalloc are incredibly fast, but they act as black boxes. You can't easily cap a parser's memory at 256MB or wipe it all out in one go without writing a custom pool allocator.

Spaces takes a different approach. It uses 64KB-aligned slabs, and the metadata lookup is just a pointer mask (ptr & ~0xFFFF).

The trade-off is that every free() incurs an L1 cache miss to read the slab header, and there is a 64KB virtual memory floor per slab. But in exchange, you get zero-external-metadata regions, instant teardown of massive structures like ASTs, and performance that surprisingly keeps up with jemalloc on cross-thread workloads (I included the mimalloc-bench scripts in the repo).

It's Linux x86-64 only right now. I'm curious if systems folks think this chunk API is a pragmatic middle ground for memory management, or if the cache-miss penalty on free() makes the pointer-masking approach a dead end for general use.

show 2 replies
ntoslinuxtoday at 1:31 PM

What is the reason for the weird `{ code };` blocks everywhere and is the below code machine generated?

```c ((PageSize) (chunk->pageSize - ((PageSize) ((PageSize) ((PageSize) (sizeof(Page) + (sizeof(struct _Block))) + (PageSize) ((sizeof(double)) - 1u)) & ((PageSize) (~((PageSize) ((sizeof(double)) - 1u)))))) - ((PageSize) ((PageSize) ((PageSize) ((sizeof(FreeBlock) + sizeof(PageSize))) + (PageSize) (((((sizeof(double)) > (4)) ? (sizeof(double)) : (4))) - ```

show 2 replies
HexDecOctBintoday at 1:15 PM

The classic Doug Lee's memory allocator[1] has explicit heaps by the name of mspaces. OP, were you aware of that; and if yes, what does your solution do better or different than dlmalloc's mspaces?

[1] https://gee.cs.oswego.edu/pub/misc/?C=N;O=D

lstoddtoday at 4:18 PM

Zig got this right to such a degree that I'm sometimes tempted to export its allocators to C via FFI. Then I sober up a bit and just rewrite it all in zig, all its instability nonwithstanding.

show 1 reply
jeffbeetoday at 3:57 PM

"That costs ~5 ns when the line is cold"

I don't see how that could possibly be true. Sounds like a low-ball estimate.

Also i wish to point out that the "tcmalloc" being used as a baseline in these performance claims is Ye Olde tcmalloc, the abandoned and now community-maintained version of the project. The current version of tcmalloc is a completely different thing that the mimalloc-bench project doesn't support (correctly; I just checked).