Looks like the default allocator uses mmap(2) for every single allocation, which is horribly inefficient - you map a whole PAGE_SIZE worth of memory for every tiny string. Aside from just wasting memory this will make the TLB very unhappy.
It looks like sp_log's string formatting is entirely unbuffered which results in lots of tiny write syscalls.
The point of the library is that you do not call the low level allocation primitive to allocate a single string. Of course, in simple programs which exit immediately, there is no difference between using a page allocator and a heap allocator. In real programs, I use an appropriate allocator for the allocation rather than making arbitrary calls to malloc(). In the sp.h examples, I use the page allocator to keep freestanding Linux simple. I could swap out a single line to be backed by an arena, but it misses the forest for the trees.
sp_log() writes directly to an IO writer. An IO writer can be buffered or unbuffered, but is unbuffered by default. This is a feature, not a bug. Have a look through the IO code!
Cheers and thanks for reading.
Not just the TLB, but the L1 D$ will be very unhappy as well. All heap objects being page aligned on most microarchs ends up making every object start at cache set 0 because the set determination ends up being indexed off of the offest within a page so that the TLB lookup can happen in parallel with the set load.
Jesus! Claude could've told this guy all these things. People underestimate how much the average malloc implementation does and how many considerations it makes. Or how much IO sucks.
That seems to be a pretty consistent quality level for the entire library. Look at the implementations in sp_math, yikes.