logoalt Hacker News

lioeterstoday at 4:18 PM3 repliesview on HN

> extra step to get from CST to AST

Could you elaborate on what this involves? I'm also looking at using tree-sitter as a parser for a new language, possibly to support multiple syntaxes. I'm thinking of converting its parse trees to a common schema, that's the target language.

I guess I don't quite get the difference between a concrete and abstract syntax tree. Is it just that the former includes information that's irrelevant to the semantics of the language, like whitespace?


Replies

WilcoKruijertoday at 6:32 PM

In this context you could say that CST -> AST is a normalization process. A CST might contain whitespace and comments, an AST almost certainly won't.

An example: in a CST `1 + 0x1 ` might be represented differently than `1 + 1`, but they could be equivalent in the AST. The same could be true for syntax sugar: `let [x,y] = arr;` and `let x = arr[0]; let y = arr[1];` could be the same after AST normalization.

You can see why having just the AST might not be enough for syntax highlighting.

As a side project I've been working on a simple programming language, where I use tree-sitter for the CST, but first normalize it to an AST before I do semantic analysis such as verifying references.

FjordWardentoday at 4:47 PM

TS returns a tree with nodes, you walk the nodes with a visitor pattern. I've experimented with using tree-sitter queries for this, but for now not found this to be easier. Every syntax will have its own CST but it can target a general AST if you will. At the end they can both be represented as s-expressions and but you need rules to go from one flavour of syntax tree to the other.

AST is just CST minus range info and simplified/generalised lexical info (in most cases).

direwolf20today at 4:43 PM

That's correct.