Great write-up. And, thanks mitchellh for Ghostty, I switched to it last year, and have not regretted it.
However, I am a somewhat surprised that the fix is reserved for a feature release in a couple of months. I would have expected this to be included in a bug fix release.
The moment you started talking about pages, I was like: “Ok, obviously memory pooled” and yup, it is. Then I said “obviously ring buffered” and yeah, essentially your scroll back reuse. Then I knew exactly where the bug was before getting to that part, not freeing the pages memory properly and sure enough - bingo! With some great looking diagrams of memory space alignment.
Kudos, that was a good read. Just remember that every time you do something novel, there’s potential for leaks :D
This feels like a case of guessing at something you could know. There are two types of allocations that each have a size and free method. The free method is polymorphic over the allocations type. Instead of using a tag to know absolutely which type an object it is you guess based on some other factor, in this case a size invariant which was violated. It also doesn't seem like this invariant was ever codified otherwise the first time a large alloc was modified to a standard size it would've blown up. It's worth asking yourself if your distinguishing factor is the best you can use or perhaps there is a better test. Maybe in this case a tag would've been too expensive.
Funny timing, I moved to Ghostty this week and just today I ran into OOM crashes in Ghostty while developing a terminal UI app. Coincidentally this TUI has a tab bar that looks like this, where UTF8 icons are used for recognizability and activity indicators (using © and € as placeholders here):
1|Flakes © 2|Installed © 3|Store © € 4|Security © €
──────────────────────────────────────────────────────────────
This works fine normally, but resizing the terminal would quickly trigger the crash - easy to avoid but still annoying!I was already preparing myself to file a bug report with the easy repro, but this sounds suspiciously close to what the blog post is describing. Fingers crossed :)
(EDIT: HN filters unicode, booo :( )
@mitchellh what did you use for the memory visualizations? Looks nice, and the website plays well with mobile. Whats the stack?
I've been following the development of Ghostty for a while and while I have the feeling that there is a bit of over-engineering in this project, I find this kind of bug post mortem to be extremely valuable for anyone in love with the craft.
Super accessible write up as someone unfamiliar with Ghostty and terminal emulators in general. Thanks!
Reliable reproductions are so valuable.
claude code also has a weird thing in ghostty where it breaks copy-paste after exiting. `reset` fixes it but it's annoying
waiting for someone to say "this wouldn't have happen if you chose rust"
Why not just use a circular buffer for the scroll back? Why use blocks at all if you’re just going to recycle them anyway? That said, great write-up.
Edit: I'm getting a lot of down votes for this but nobody is saying why I'm wrong. If you think I'm wrong enough to down vote, please reply why.
I don't understand why that is the preferred fix. I would have solved it other ways:
1. When resizing the page, leave some flag of how it was allocated. This tagging is commonly done as the always 0 bits in size or address fields to save space.
2. Since the pool is a known size of contiguous memory, check if the memory to be freed is within that range
3. Make the size immutable. If you want to realloc, go for it, and have the memory manager handle that boundary for you.
Both of those not only maintain functionality which seems to have been lost with the feature reduction but also are more future proof to any other changes in size.
speaking of claude code in Ghostty, I’ve noticed I can’t drag and drop images into the prompt when the session is within a tmux pane. I miss that, coming from the mac terminal app, which allowed me to do so. I’d be willing to look into this myself, but mention it in case someone already knows where to start looking.
Would this kind of bug have been catched by the Rust compiler?
I wonder how a Rust-based terminal implements this without sacrificing performance.
The number of people here on HN gaslighting those that said they ran into this bug an challenging them to prove it was real..
There are times where is just makes sense to read, measure and really understand why leaks, bugs and performance issues happen and vibe-coders will get stuck on this very quickly.
This excellent write-up from michellh explains the issue in depth and all his blogs in building Ghostty are a recommended read on the Ghostty's internals.
Similarly, these write-ups are a great read. Here is another one that documents a goroutine leak and how it was detected, fixed without restarting production. [0]
This is what most vibe-coders will NOT do when faced with a non-trivial issue, with a serious software product.
[0] https://skoredin.pro/blog/golang/goroutine-leak-debugging
[dead]
The contrast between the attitude here https://news.ycombinator.com/item?id=46461860 and in this story is a bit wacky to me.
[flagged]
Ugh. Is it just me, or is anyone else feeling a tad uncomfortable that their terminal app needs a custom memory allocator that mucks with low-level page tags?
I hate to say it, but this probably would not have happened in a garbage collected language.
GC languages are fast these days. If you don't want a runtime like C# (which has excellent performance) a language like Go would have worked just fine here, compiling to a small native binary but with a GC.
I don't really understand the aversion to GC's. In memory constrained scenarios or where performance is an absolute top priority, I understand wanting manual control. But that seems like a very rare scenario in user space.
What's the best claude code terminal? I'm not sure if ghostty is it, which one can sync to iphone / android tablet for remote use of the same session?
This is great news! Well done to everyone who helped sort it out. It was a problem noted by users in a thread here just last week, https://news.ycombinator.com/item?id=46460319
While Claude Code might have been the reason this bug became triggered by more people, there are some of us who were hitting it without ever having used Claude Code at all. Maybe the assumption about what makes a page non-standard, isn't as black-and-white as presumed. And I wonder if the leak would have been triggered more often for people who use scrollback-limit = 0, or something very small.
Probably not a huge deal, but it does seem the fix will needlessly delete and recreate non-standard pages in the case where the new page needs to be non-standard, and the oldest one (that needs to be pruned) already is non-standard and could be reused.