logoalt Hacker News

pacjamlast Tuesday at 11:55 PM1 replyview on HN

Ah gotcha! In that case, I think Terminal-Bench is currently the best proxy for "how good is this harness+agent combo at coding (quantitatively)" question. I think it used to be SWE-Bench, but I think T-Bench is a better proxy for this now. Like you said though, unfortunately Cursor isn't listed (probably their choice to not list it, maybe because it doesn't place highly).


Replies

koakuma-chanlast Wednesday at 12:32 AM

Alright, I will try out Letta Code manually later then.

show 1 reply