logoalt Hacker News

alistairSHyesterday at 8:14 PM1 replyview on HN

How is success defined in those metrics? Is success "perfect - can deploy to prod immediately" or "saved some arbitrary amount of engineering time"?

Anecdotal experience from my team of 15 engineers is we rarely get "perfect" but we do get enough to massive time savings across several common problem domains.


Replies

Esophagus4yesterday at 10:32 PM

I think for me, it’s not so much an objective success metric as it is showing its progression over time.

That’s what marvels me is how fast LLMs are progressing. And it still feels like early days (!).

For methodology, I would check out the METR website though, they’ve published their results.