logoalt Hacker News

chrswtoday at 12:41 AM1 replyview on HN

I don't even look at benchmarks anymore. I just try different models as they're released on our large, proprietary, systems software codebases in real, shipping products or projects that will ship eventually. It's pretty clear which models help me do my job better or faster. I'm fortunate enough to have the token budget to use basically as much as I need, for now.

No need for benchmarks, evals, marketing, system cards or anything like that. I read the web for tips, practices and release announcements. My colleagues and I share our experiences with each other but beyond that, everything else is just noise.


Replies

BlaDeKketoday at 2:13 AM

This is the way. Not that big of budget here. But if there’s something promising, I just try that for a month or so. But even then… at this moment I’m using z.ai models and those do the job. No need for anything else. So I’m staying until there is something new, same affordability, but a lot better. (Using a coding plan)