> Is your intention that people use the Fil-C garbage collector instead of free()? Or is it just a backstop in case of memory leak bugs?
Wow great question!
My intention is to give folks powerful options. You can choose:
- Compile your code with Fil-C while still maintaining it for Yolo-C. In that case, you'll be calling free(). Fil-C's free() behavior ensures no GC-induced leaks (more on that below) so code that does this will not have leaks in Fil-C.
- Fully adopt Fil-C and don't look back. In that case, you'll probably just lean on the GC. You can still fight GC-induced leaks by selectively free()ing stuff.
- Both of the above, with `#ifdef __FILC__` guards to select what you do. I think you will want to do that if your C program has a custom GC (this is exactly what I did with emacs - I replaced its super awesome GC with calls to my GC) or if you're doing custom arena allocations (arenas work fine in Fil-C, but you get more security benefit, and better memory usage, if you just replace the arena with relying on GC).
The reason why the GC is there is not as a backstop against memory leaks, but because it lets me support free() in a totally sound way with deterministic panic on any use-after-free. Additionally, the way that the GC works means that a program that free()s memory is immune to GC-induced memory leaks.
What is a GC-induced leak? For decades now, GC implementers like me have noticed the following phenomena:
- Someone takes a program that uses manual memory management and has no known leaks or crashes in some set of tests, and converts it to use GC. The result is a program that leaks on that set of tests! I think Boehm noticed this when evangelizing his GC. I've noticed it in manual conversions of C++ code to Java. I've heard others mention it in GC circles.
- Someone writes a program in a GC'd language. Their top perf bug is memory leaks, and they're bad. You scratch your head and wonder: wasn't the whole point of GC to avoid this?
Here's why both phenomena happen: folks have a tendency keep dangling pointers to objects that they are no longer using. Here's an evil example I once found: there's a Window god-object that gets created for every window that gets opened. And for reasons, the Window has a previousWindow pointer to the Window from which the user initiated opening the window. The previousWindow pointer is used in initialization of the Window, but never again. Nobody nulled previousWindow.
The result? A GC-induced leak!
In a malloc/free program, the call to previousWindow.destroy() (or whatever) would also delete (free()) the object, and you'd have a dangling pointer. But it's fine because nobody dereferences it. It's a correct case of dangling pointers! But in the GC'd program, the dangling program keeps previousWindow around, and then there's previousWindow.previousWindow, and previousWindow.previousWindow.previousWindow, and... you get the idea.
This is why Fil-C's answer to free() isn't to just ignore it. Fil-C strongly supports free():
- Freeing an object immediately flags the capability as being empty and free. No memory accesses will succeed on the object anymore.
- The GC does not scan any outgoing references from freed objects (and it doesn't have to because the program can't access those references). Note that there's almost a race here, except https://fil-c.org/safepoints saves us. This prevents previousWindow.previousWindow from leaking.
- For those pointers in the heap that the GC can mutate, the GC repoints the capability to the free'd singleton instead of marking the freed object. If all outstanding pointers to a freed object are repointable, then the object won't get marked, and will die. This prevents previousWindow from leaking.
> Can the GC be configured to warn or panic if something is GCed without free()?
Nope. Reason: the Fil-C runtime itself now relies on GC, and there's some functionality that only a GC can provide that has proven indispensable for porting some complex stuff (like CPython and Perl5).
It would take a lot of work to convert the Fil-C runtime to not rely on GC. It's just too darn convenient to do nasty runtime stuff (like thread management and signal handling) by leaning on the fact that the GC prevents stuff like ABA problems. And if you did make the runtime not rely on GC, then your leak detector would go haywire in a lot of interesting ports (like CPython).
But, I think someone might end up doing this exercise eventually, because if you did it, then you could just as well build a version of Fil-C that has no GC at all but relies on the memory safety of sufficiently-segregated heaps.
Is it possible to use Fil-C as a replacement for valgrind/address sanitizer/leak sanitizer? I.e. say I have a C program that does manual memory management already. Can I then compile it with Fil-C and have it panic/assert on heap use after free, uninitialized memory read (including stack), array out of bounds read, etc?
> Nope. Reason: the Fil-C runtime itself now relies on GC, and there's some functionality that only a GC can provide that has proven indispensable for porting some complex stuff (like CPython and Perl5).
What if there was a flag you could set on an allocation, “must be freed”. An app can set the “must be freed” flag on its allocations, meaning when the GC collects the allocation, it checks if free() has been called on it, and if it hasn’t, it logs a warning (or even panics), depending on process configuration flags. Meanwhile, internal allocations by the runtime won’t set that flag, so the GC will never panic/warn on collecting them.
Okay, that's brilliant. I didn't even imagine that the GC-induced leak problem was even solvable. I guess the freed-but-not-GCed object could be arbitrarily large, but that's almost never going to be a gradual leak.
What's awesome about the Emacs GC?