logoalt Hacker News

sally_glanceyesterday at 4:00 PM1 replyview on HN

They absolutely can do that if you give them the tools. Seeing Claude (I use it with opencode agents) run curl and playwright to verify and then fix it's implementation was a real 'wow' moment for me.


Replies

Q6T46nT668w6i3myesterday at 5:34 PM

We have different experiences. Often I’ll see Claude, et. al. find creative ways to fulfill the task without satisfying my intent, e.g., changing the implementation plan I specifically asked for, changing tolerances or even tests, and frequently disabling tests.

show 2 replies