Why can't I see Cursor on tbench? Is it that bad that it's not even on the leaderboard? I am trying to figure out if I can pitch your product to my company, and whether it is worth it.
Not sure why Cursor CLI isn't on the leaderboard... I'm guessing it's because Cursor is focused primarily on their IDE agent, not their CLI agent, and Terminal-Bench is an eval/benchmark for CLI agents exclusively.
If you're asking about why Letta Code isn't on the leaderboard, the TBench maintainers said it should be up later today (so probably refresh in a few hours!). The results are already public, you can see them on our blog (graphs linked in the OP). They are also verifiable, all data is available for the runs + Letta Code is open source, so you can replicate the results yourself.
Not sure why Cursor CLI isn't on the leaderboard... I'm guessing it's because Cursor is focused primarily on their IDE agent, not their CLI agent, and Terminal-Bench is an eval/benchmark for CLI agents exclusively.
If you're asking about why Letta Code isn't on the leaderboard, the TBench maintainers said it should be up later today (so probably refresh in a few hours!). The results are already public, you can see them on our blog (graphs linked in the OP). They are also verifiable, all data is available for the runs + Letta Code is open source, so you can replicate the results yourself.