(author here) I agree that it's a lot of complexity, and I acknowledge this in the article: You can get quite far with just a bump allocator.
I didn't go into this at all, but the main benefit of this design is how well it interacts with CPU cache. This has almost no effect on the parser, because you're typically just writing the AST, not reading it. I believe that subsequent stages benefit much more from faster traversal.
(By the way, I am a huge fan of your work. Crafting interpreters was my introduction to programming languages!)