Quickly looking at the source code, mostly treeBuilder and tokenizer, I do see several possible improvements: - Use Typescript instead of JavaScript - Use perfect hashes instead of ["a', "b", "c"].includes() idioms, string equalities, Seys, etc. - Use a single perfect hash to match all tags/attribute names and then use enums in the rest of the codebase - Use a single if (token.kind === Tag.START instead of repeating that for 10 consecutive conditionals - Don't return the "reprocess" constant, but use an enum or perhaps nothing if "reprocess" is the only option - Try tail recursion instead of a switch over the state in the tokenizer - Use switches (best after a perfect hash lookup) instead of multiple ifs on characters in the tokenizer - "treeBuilder.openElements = treeBuilder.open_elements;" can't possibly be good code
Perhaps the agent can find these themselves if told to make the code perfect and not just pass tests
Thanks for the feedback - I pasted it into a Claude Code session on my phone, here's the resulting PR: https://github.com/simonw/justjshtml/pull/7
I didn't include the TypeScript bit though - it didn't use TypeScript because I don't like adding a build step to my JavaScript projects if I can possible avoid it. The agent would happily have used TypeScript if I had let it.
I don't like that openElements = open_elements pattern either - it did that because I asked it for a port of a Python library and it decided to support the naming conventions for both Python and JavaScript at once. I told it to remove all of those.
I had it run a micro benchmark too against the before and after - here's the code it used for that: https://github.com/simonw/justjshtml/blob/a9dbe2d7c79522a76f...
After applying your suggestions: It pushed back against the tail recursion suggestion:> The current implementation uses a switch statement in step(). JavaScript doesn’t have proper tail call optimization (only Safari implements it), so true tail recursion would cause stack overflow on large documents.