I'm using judge0 for a Leetcode-clone I'm working on. Never thought of using it in the context of LLMs, though.