Let's say we take Anthropic's security and alignment claims at face value, and they have m...

loudmax • today at 5:05 PM • 2 replies • view on HN

Let's say we take Anthropic's security and alignment claims at face value, and they have models that are really good at uncovering bugs and exploiting software.

What should Anthropic do in this case?

Anthropic could immediately make these models widely available. The vast majority of their users just want develop non-malicious software. But some non-zero portion of users will absolutely use these models to find exploits and develop ransomware and so on. Making the models widely available forces everyone developing software (eg, whatever browser and OS you're using to read HN right now) into a race where they have to find and fix all their bugs before malicious actors do.

Or Anthropic could slow roll their models. Gatekeep Mythos to select users like the Linux Foundation and so on, and nerf Opus so it does a bunch of checks to make it slightly more difficult to have it automatically generate exploits. Obviously, they can't entirely stop people from finding bugs, but they can introduce some speedbumps to dissuade marginal hackers. Theoretically, this gives maintainers some breathing space to fix outstanding bugs before the floodgates open.

In the longer run, Anthropic won't be able to hold back these capabilities because other companies will develop and release models that are more powerful than Opus and Mythos. This is just about buying time for maintainers.

I don't know that the slow release model is the right thing to do. It might be better if the world suffers through some short term pain of hacking and ransomware while everyone adjusts to the new capabilities. But I wouldn't take that approach for granted, and if I were in Anthropic's position I'd be very careful about about opening the floodgate.

Replies

recallingmemory • today at 6:18 PM

Couldn't we use domain records to verify that a website is our own for example with the TXT value provided by Anthropic?

Google does the same thing for verifying that a website is your own. Security checks by the model would only kick off if you're engaging in a property that you've validated.

pingou • today at 5:09 PM

Or they could check if the source is open source and available on the internet, and if yes refuse to analyse it if the person who request the analysis isn't affiliated to the project.

That will still leave closed source software vulnerable, but I suspect it is somewhat rare for hackers to have the source of the thing they are targeting, when it is closed source.

➕ show 1 reply

alt Hacker News

Replies