Ah gotcha! In that case, I think Terminal-Bench is currently the best proxy for "how good is th...

pacjam • last Tuesday at 11:55 PM • 1 reply • view on HN

Ah gotcha! In that case, I think Terminal-Bench is currently the best proxy for "how good is this harness+agent combo at coding (quantitatively)" question. I think it used to be SWE-Bench, but I think T-Bench is a better proxy for this now. Like you said though, unfortunately Cursor isn't listed (probably their choice to not list it, maybe because it doesn't place highly).

Replies

koakuma-chan • last Wednesday at 12:32 AM

Alright, I will try out Letta Code manually later then.

➕ show 1 reply

alt Hacker News

Replies