>The models we have now will not do it,
Except that they will, if you trick them which is trivial.
I have to call BS here.
They can be coerced to do certain things but I'd like to see you or anyone prove that you can "trick" any of these models into building software that can be used autonomously kill humans. I'm pretty certain you couldn't even get it to build a design document for such software.
When there is proof of your claim, I'll eat my words. Until then, this is just lazy nonsense
Yes, they are easy to fool. That has nothing to do with them acting with “intention “ which is the risk here.