logoalt Hacker News

gertlabstoday at 7:21 AM0 repliesview on HN

Success rate measures the amount of code submissions that played the game/environment without failing (compilation, breaking game rules, violating sandbox, etc.), so it makes sense Python would do better there.

Percentile compares only the submissions that didn't hard-fail. So they are a bit different, and we incorporate them both into the combined score.