logoalt Hacker News

59nadirtoday at 5:37 AM4 repliesview on HN

We've seen public examples of where LLMs literally disable or remove tests in order to pass. I'm not sure having tests and asking LLMs to not merge things before passing them being "easy" matters much when the failure modes here are so plentiful and broad in nature.


Replies

jawigginstoday at 4:22 PM

You'd want to have the tests run as a github action and then fail the check if the tests don't pass. Optio will resume agents when the actions fail and tell them to fix the failures.

ElFitztoday at 7:00 AM

My favourite so far was Claude "fixing" deployment checks with `continue-on-error: true`

jamiemallerstoday at 9:15 AM

[dead]

AbanoubRodolftoday at 6:53 AM

[dead]