logoalt Hacker News

waherntoday at 5:56 AM2 repliesview on HN

Lexical parsing C is simple, except that typedef's technically make it non-context-free. See https://en.wikipedia.org/wiki/Lexer_hack When handwriting a parser, it's no big deal, but it's often a stumbling block for parser generators or other formal approaches. Though, I recall there's a PEG-based parser for C99/C11 floating around that was supposed to be compliant. But I'm having trouble finding a link, and maybe it was using something like LPeg, which has features beyond pure PEG that help with context dependent parsing.


Replies

nextaccountictoday at 6:19 AM

Clang's solution (presented at the end of the Wikipedia article you linked) seem much better - just use a single lexical token for both types and variables.

Then, only the parser needs to be context sensitive, for the A* B; construct which is either a no-op multiplication (if A is a variable) or a variable declaration of a pointer type (if A is a type)

show 1 reply
mahmoudimustoday at 7:49 AM

I think you're referring to this one: https://github.com/jhjourdan/C11parser

show 1 reply