The 100% score, all by itself, should cause suspicion. A hundred percent? Really?
Others have already pointed out how the test was skewed (testing for strict adherence to the law, when part of a judge's job is to make judgment calls including when to let someone off for something that technically breaks the law but shouldn't be punished), so I won't repeat it here. But any time the LLM gets one hundred percent on a test, you should check what the test is measuring. I've seen people tout as a major selling point that their LLM scored a 92% on some test or other. Getting 100% should be a "smell" and should automatically make you wonder about that result.