But, that's an enormous source of coding productivity, and it's why Anthropic is worth billions... The reason SWE-bench has been so successful and useful for coding is that software engineering has a ton of tradition and infrastructure for making and using automated tests.