logoalt Hacker News

ainiriand11/08/20243 repliesview on HN

Sorry to bother you, but would that be open-source by any chance? Is there any public repo available? Thank you.


Replies

norman78411/08/2024

I wrote my fairly share of parsers the last year, and the one I liked a lot is from Salsa examples, you can find it here[0].

[0] https://github.com/salsa-rs/salsa/blob/e4d36daf2dc4a09600975...

brundolf11/08/2024

Yup! You can find it here: https://github.com/brundonsmith/bagel-rs/blob/master/src/mod...

[trying to remind myself how this works because it's been a while]

So it's got macros for defining "union types", which combine a bunch of individual structs into an enum with same-name variants, and implement From and TryFrom to box/unbox the structs in their group's enum

ASTInner is a struct that holds the Any (all possible AST nodes) enum in its `details` field, alongside some other info we want all AST nodes to have

And then AST<TKind> is a struct that holds (1) an RC<ASTInner>, and (2) a PhantomData<TKind>, where TKind is the (hierarchical) type of AST struct that it's known to contain

AST<TKind> can then be:

1. Downcast to a TKind (basically just unboxing it)

2. Upcast to an AST<Any>

3. Recast to a different AST<TKind> (changing the box's PhantomData type but not actually transforming the value). This uses trait implementations (implemented by the macros) to automatically know which parent types it can be "upwardly casted to", and which more-specific types it can try and be casted to

The above three methods also have try_ versions

What this means then is you can write functions against, eg, AST<Expression>. You will have to pass an AST<Expression>, but eg. an AST<BooleanLiteral> can be infallibly recast to an AST<Expression>, but an AST<Any> can only try_recast to AST<Expression> (returning an Option<AST<Expression>>)

Another cool property of this is that there are no dynamic traits, and the only heap pointers are the Rc's between AST nodes (and at the root node). Everything else is enums and concrete structs; the re-casting happens solely with that PhantomType, at the type level, without actually changing any data or even cloning the Rc unless you unbox the details (in downcast())

I worked in this codebase for a while and the dev experience was actually quite nice once I got all this set up. But figuring it out in the first place was a nightmare

I'm wondering now if it would be possible/worthwhile to extract it into a crate

show 1 reply
yu3zhou411/08/2024

Maybe it can work as a quick glimpse into how parser and lexer can work in Rust https://github.com/jmaczan/0x6b73746b

I wrote it long time ago and it’s not fully implemented tho