logoalt Hacker News

cxrtoday at 12:37 AM4 repliesview on HN

Few know that Firefox's HTML5 parser was originally written in Java, and only afterward semi-mechanically translated (pre-LLMs) to the dialect of C++ used in the Gecko codebase.

This blog post isn't really about HTML parsers, however. The JustHTML port described in this blog post was a worthwhile exercise as a demonstration on its own.

Even so, I suspect that for this particular application, it would have been more productive/valuable to port the Java codebase to TypeScript rather than using the already vibe coded JustHTML as a starting point. Most of the value of what is demonstrated by JustHTML's existence in either form comes from Stenström's initial work.


Replies

simonwtoday at 12:54 AM

Whoa... it looks like the Firefox HTML5 parser is still maintained as Java to this day!

Here's the relevant folder:

https://github.com/mozilla-firefox/firefox/tree/main/parser/...

  make translate        # perform the Java-to-C++ translation from the remote
                        # sources
And active commits to that javasrc folder - the last was in November: https://github.com/mozilla-firefox/firefox/commits/main/pars...
show 2 replies
simonwtoday at 12:48 AM

There are certainly dozens of better ways to do what I did here.

I picked JustHTML as a base because I really liked the API Emil had designed, and I also thought it would be darkly amusing to take his painstakingly (1,000+ commits, 2 months+ of work) constructed library and see if I could port it directly to Python in an evening, taking advantage of everything he had already figured out.

QuantumNomad_today at 1:53 AM

IANAL. In my opinion, porting code to a different language is still derivative work of the code you are porting it from. Whether done by hand or with an LLM. And in my opinion, the license of the original code still applies. Which means that not only should one link to the repo for the code that was ported, but also make sure to adhere to the terms to the license.

The MIT family of licenses state that the copyright notice and terms shall be included in all copies of the software.

Porting code to a different language is in my opinion not much different from forking a project and making changes to it, small or big.

I therefore think the right thing to do is to keep the original copyright notice and license file, and adding your additional copyright line to it.

So for example if the original project had an MIT license file that said

Copyright 2019 Suchandsuch

Permission is hereby granted and so on

You should keep all of that and add your copyright year and author name on the next line after the original line or lines of the authors of the repo you took the code from.

show 1 reply
fergietoday at 11:45 AM

Surely for debugging and auditing it's always better to write libs in JavaScript? Also, given that much of TypeScripts utilty is for improving the developer experience- is it still as relevant for machine-generated code?