logoalt Hacker News

johnfnyesterday at 3:55 PM2 repliesview on HN

The author says that he runs both the reference implementation and the new Rust implementation through 2 million (!) randomly generated battles and flags every battle where the results don't line up.


Replies

simonwyesterday at 4:00 PM

This is the key to the whole thing in my opinion.

If you ask a coding agent to port code from one language to the another and don't have a robust mechanism to test that the results are equivalent you're inevitably going to waste a lot of time and money on junk code that doesn't work.

Herringyesterday at 4:30 PM

Yeah and he claims a pass rate of 99.96%. At that point you might be running into bugs in the original implementation.