I checked with the team and it may have been some temporary rate-limiting issue. We've rectified the results, it seems to be an isolated case.
Are these benchmarks correct that adding Anthropic's Constitutional AI system prompt lowered results across all the models?
Thanks for the thoroughness! I look forward to the next steps as you all apply this approach in other unique ways to have even better results.