We've seen public examples of where LLMs literally disable or remove tests in order to pass. I&...

59nadir • today at 5:37 AM • 4 replies • view on HN

We've seen public examples of where LLMs literally disable or remove tests in order to pass. I'm not sure having tests and asking LLMs to not merge things before passing them being "easy" matters much when the failure modes here are so plentiful and broad in nature.

Replies

jawiggins • today at 4:22 PM

You'd want to have the tests run as a github action and then fail the check if the tests don't pass. Optio will resume agents when the actions fail and tell them to fix the failures.

ElFitz • today at 7:00 AM

My favourite so far was Claude "fixing" deployment checks with `continue-on-error: true`

jamiemallers • today at 9:15 AM

[dead]

AbanoubRodolf • today at 6:53 AM

[dead]

alt Hacker News

Replies