logoalt Hacker News

esafranchiktoday at 5:25 PM1 replyview on HN

Is the benchmark measuring one-shot retrieval accuracy, or Coding agent response accuracy?


Replies

stephantultoday at 5:32 PM

Hey! Co-author here. The benchmark currently only measures retrieval accuracy.

We’re interested in measuring it end to end and also optimizing, e.g. the prompt and tools, for this, but we just haven’t gotten around to it.

show 1 reply