It looks like, overall, this design gets the parser about twice as fast as a simple one that creates...

munificent • last Wednesday at 7:29 PM • 3 replies • view on HN

It looks like, overall, this design gets the parser about twice as fast as a simple one that creates tree-like ASTs.

That's not nothing. But a parser is rarely the most time-intensive part of a production compiler. And the parser does get iterated on a lot in languages that are evolving and adding new syntax.

Given that, I'd be inclined to take the performance hit and stick with a simpler AST representation if that yields a more hackable, maintainable compiler front end.

Replies

benhoyt • last Wednesday at 8:07 PM

That's a good caution. However, traversing a flat AST (iterating a "struct of arrays" rather than a pointer-based tree) is also going to be faster. So the next steps of the compiler, say type checking and code emitting, will also be faster. But how much, or whether it's worth it even then, I'm not sure.

➕ show 1 reply

exyi • last Wednesday at 7:43 PM

Usually yes, but it's still a neat trick to be aware of. For interpreted scripting languages, parsing can actually be a significant slowdown. Even more so when we start going into text-based network protocols, which also need a parser (is CSS a programming language or a network protocol? :) )

uaksom • last Thursday at 12:01 AM

(author here) I agree that it's a lot of complexity, and I acknowledge this in the article: You can get quite far with just a bump allocator.

I didn't go into this at all, but the main benefit of this design is how well it interacts with CPU cache. This has almost no effect on the parser, because you're typically just writing the AST, not reading it. I believe that subsequent stages benefit much more from faster traversal.

(By the way, I am a huge fan of your work. Crafting interpreters was my introduction to programming languages!)

alt Hacker News

Replies