logoalt Hacker News

chatmastayesterday at 11:51 PM0 repliesview on HN

This is similar to how our college CS problem sets were graded. We were given a spec, and we had to implement a program that conformed to it. We had access to 70% of the test suite during development, and another 30% was hidden and only evaluated after submission. We were graded out of 100.

It was effective at making you think about the problem and anticipate what tests might be missing. I can see how this would be effective for coding agents, which tend to get progressively lazier at writing tests as session context grows.