Yes, this is the pass@k metric from code generation research. Found the relevant paper Evaluating Large Language Models Trained on Code (Chen et al., 2021) which introduced the metric.
Interesting, and how does Twill uses it in that feature?
Interesting, and how does Twill uses it in that feature?