The benchmarks aren't great, they're super specific to sem's output: why would I ask ...

awoimbee • yesterday at 10:50 PM • 1 reply • view on HN

The benchmarks aren't great, they're super specific to sem's output: why would I ask Claude how many "entities" were modified by a commit and do I need a tool specifically for this request ? Note that an "entity" is a sem-specific concept...

Replies

rohanucla • yesterday at 10:52 PM

Thanks for pointing it out. I agree with you here, my testing process was quite specific to sem's output but also would love any suggestion from you of how you would design the whole testing process for this kind of tool?

I can also give my thought process, because I was more interested in figuring out the model's inherent search results and understanding without sem.

alt Hacker News

Replies