logoalt Hacker News

vivzkestreltoday at 4:43 PM2 repliesview on HN

- as a guy who is absolutely not familiar with the idea of how code editors work and has to make a browser based code editor, what are the things that you think I should know?

- i got a hint of language server and tree sitter thanks to this wonderfully written post but it is still missing a lot of details like how does the protocol actually look like, what does a standard language server or tree sitter implementation looks like

- what are the other building blocks?


Replies

feznyngtoday at 7:03 PM

LSPs rely on a parser to generate an AST for a given language. This parser needs to be error-tolerant because it needs to return usable ASTs despite often parsing incomplete, incorrect code and fast enough to run on every keystroke so it can provide realtime feedback.

Most of the time they rely on their own hand-rolled recursive descent parser. Writing these isn't necessarily hard but time-consuming and tedious especially if you're parsing a large language like C++.

Parser generators like yacc, bison, chumsky, ANTLR etc. can generate a parser for you given a grammar. However these parsers usually don't have the best performance or error reporting characteristics because they are auto-generated. A recursive descent parser is usually faster and because you can customize syntax error messages, easier for an LSP to use to provide good diagnostics.

Tree-sitter is also a parser generator but has better error tolerance properties (not quite as good as hand-written but generally better than prior implementations). Additionally, its incremental meaning it can reuse prior parses to more efficiently create a new AST. Most hand-written parsers are not incremental but are usually still fast enough to be usable in LSPs.

To use tree-sitter you define a grammar in JavaScript that tree-sitter will use to generate a parser in C which you can then use a dynamic or static library in your application.

In your case, this is useful because you can compile down those C libraries to WASM which can run right in the browser and will usually be faster than pure JS (the one catch is serialization overhead between JS and WASM). The problem is that you still need to implement all the language analysis features on top.

A good overview of different parsing techniques: https://tratt.net/laurie/blog/2020/which_parsing_approach.ht... LSP spec: https://microsoft.github.io/language-server-protocol/overvie... VSCode's guide on LSP features: https://code.visualstudio.com/api/language-extensions/progra... Tutorial on creating hand-rolled error-tolerant (but NOT incremental) recursive descent parsers: https://matklad.github.io/2023/05/21/resilient-ll-parsing-tu... Tree-sitter book: https://tree-sitter.github.io/tree-sitter/

ferguess_ktoday at 5:51 PM

I don't know why you get downvoted. This article doesn't provide much details. I'd expect at least a series of posts for the comparison.

Let me be blunt: any article posted here should provide more information, or more in-depth analysis than Wikipedia. Since I'm not a compiler person, I might be too harsh to suggest that the article does not provide more in-depth analysis (because it is definitely shorter than it) than the Wikipedia article -- I apologize if that's the case.