What are these "passes" they reference here? Haven't seen that before in LLM evals
Could definitely be interesting for having another model run over the codebase when looking for improvements
It's the number of attempts at answering the question.
It's the number of attempts at answering the question.