logoalt Hacker News

pimeys11/08/20243 repliesview on HN

OCaml has been pretty common tool to write parsers for many years. Not a bad choice.

I've written parsers professionally with Rust for two companies now. I have to say the issues you had with the borrow checker are just in the beginning. After working with Rust a bit you realize it works miracles for parsers. Especially if you need to do runtime parsing in a network service serving large traffic. There are some good strategies we've found out to keep the borrow checker happy and at the same time writing the fastest possible code to serve our customers.

I highly recommend taking a look how flat vectors for the AST and using typed vector indices work. E.g. you have vector for types as `Vec<Type>` and fields in types as `Vec<(TypeId, Field)>`. Keep these sorted, so you can implement lookups with a binary search, which works quite well with CPU caches and is definitely faster than a hashmap lookup.

The other cool thing with writing parsers with Rust is how there are great high level libraries for things like lexing:

https://crates.io/crates/logos

The cool thing with Logos is it keeps the source data as a string under the surface, and just refers to a specific locations in it. Now use these tokens as a basis for your AST tree, which is all flat data structures and IDs. Simplify the usage with a type:

    #[Clone, Copy]
    struct Walker<'a, Id> {
        pub id: Id,
        pub ast: &'a Ast,
    }

    impl<'a, Id> Walker<'a, Id> {
        pub fn walk<T>(self, other_id: T) -> Walker<'a, T> {
            Walker { id: other_id, ast: self.ast }
        }
    }
Now you can specialize these with type aliases:

    type TypeWalker<'a> = Walker<'a, TypeId>;
And implement methods:

    impl<'a> TypeWalker<'a> {
        fn as_ref(&self) -> &'a Type {
            &self.ast[self.id]
        }
        
        fn name(&self) -> &'a str {
            &self.as_ref().name
        }
    }
From here you can introduce string interning if needed, it's easy to extend. What I like about this design is how all the IDs and Walkers are Copy, so you can pass them around as you like. There's also no reference counting needed anywhere, so you don't need to play the dance with Arc/Weak.

I understand Rust feels hard especially in the beginning. You need to program more like you write C++, but with Rust you are enforced to play safe. I would say an amazing strategy is to first write a prototype with Ocaml, it's really good for that. Then, if you need to be faster, do a rewrite in Rust.


Replies

miningape11/08/2024

Thanks for your comment, you've given me a lot to chew on and I think I'll need to bookmark this page.

> I've written parsers professionally with Rust for two companies now

If you don't mind me asking, which companies? Or how do you get into this industry within an industry? I'd really love to work on some programming language implementations professionally (although maybe that's just because I've built them non-professionally until now),

> Especially if you need to do runtime parsing in a network service serving large traffic.

I almost expected something like this, it just makes sense with how the language is positioned. I'm not sure if you've been following cloudflare's pingora blogs but I've found them very interesting because of how they are able to really optimise parts of their networking without looking like a fast-inverse-sqrt.

> There's also no reference counting needed anywhere, so you don't need to play the dance with Arc/Weak.

I really like the sound of this, it wasn't necessarily confusing to work with Rc and Weak but more I had to put in a lot of extra thought up front (which is also valuable don't get me wrong).

> I would say an amazing strategy is to first write a prototype with Ocaml, it's really good for that.

Thanks! Maybe then the Rust code I have so far won't be thrown in the bin just yet.

show 1 reply
packetlost11/08/2024

There's also the phenomenal pest library. It probably wouldn't be as fast, but I've found that usually parsing a performance critical part of a system. If it is, a manually writing the parser is definitely the way to go.

marcosdumay11/08/2024

> Especially if you need to do runtime parsing in a network service serving large traffic

Yeah, that's the focus of it, and the thing you can use Rust well.

All the popular Rust parsing libraries aren't even focused on the use that most people use "parser" to name. They can't support language-parsing at all, but you only discover that after you spent weeks fighting with the type-system to get to the real PL problems.

Rust itself is parsed by a set of specialized libraries that won't generalize to other languages. Everything else is aimed at parsing data structures.

show 1 reply