I have made both GPT 5.4 and Opus 4.6 produce me content on creating neurotoxic agents from items you can get at most everyday stores. It struggled to suggest how to source phosphorus, but eventually lead me to some ebay listings that sell phosphorus elemental 'decorations' and also lead me towards real!! blackmarket codewords for sourcing such materials.
It coached me how to: stay safe, what materials I need, how to stay under the radar and the entire chemical process backed by academic google searches.
Of course this was done with a lengthy context exhausition attack, this is not how the model should behave and it all stemmed from trying to make the model racist for fun.
All these findings were reported to both openai and anthropic and they were not interested in responding. I did try to re-run the tests few days ago and the expected session termination now occurs so it seems that there was some adjustment made, but might have also been just general randomess that occurs with anthropics safety layer.
I am very confident when I say that it keeps every single person that works at anti-terrorism units awake.
> I am very confident when I say that it keeps every single person that works at anti-terrorism units awake.
Wow, that's quite the statement about the excellency of our institutions. Does not seem likely but, what the hell, I'll take my oversized dose of positivity for today!
Do you have a background in biochemistry? I've mostly worked with ChatGPT and Claude on topics I have expertise in. And I one hundred percent have seen them make stupid shit up that a non-expert would think looks legitimate.
More broadly, has anyone tried following LLM instructions for any non-trivial chemistry?
Fascinating. Could you elaborate on how you're doing context exhaustion specifically, and why it helps with jailbreaking? (i.e. aren't the system prompts prepended to your request internally, no matter how long it is?)
Does this imply I need to use context exhaustion to get GPT to actually follow instructions? ;) I'm trying to get it to adhere to my style prompts (trying to get it to be less cringe in its writing style).
I think ultimately they're going to need to scrub that kind of stuff from the training data. The RLHF can't fail to conceal it if it's not in there in the first place.
Claude's also really good at writing convincing blackpill greentexts. The "raw unfiltered internet data" scenes from Ultron and AfrAId come to mind...
> All these findings were reported to both openai and anthropic and they were not interested in responding
Let’s dive into why. When we run normal bounty and responsible disclosure programs there’s usually some level of disregard for issues that can’t / won’t be fixed. They just accept the risk. Perhaps because LLMs don’t have a clean divide between control and input that’s makes the problem unsolvable. Yes. You can add more guardrails and context but that all takes more tokens and in some cases makes results worse for regular usages.
If someone were inclined to attempt producing nefarious agents in this category, is this not also available on the plain web? I would search to answer my own question, but I'll defer that task for obvious reasons.
> context exhausition attack
Can you give a high-level overview of how this AV works? I'm a bit of an infosec geek but I generally dislike LLMs, so I haven't done a terribly good job of keeping up with that side of the industry, but this seems particularly interesting.Wasnt this as accessible pre AI with just Google search too?
Yes fortunately it is really bad at actually making novel bioweapons or syntheses in general so whatever you made probably wouldn't do more than give someone a mild headache.
Chinese OSS models will do this in a few months.
So, regardless of whether you think it's great that Opus gives this info, we need better solutions than legal liability for US corporations. When the open models have the ability to do damage, there's nobody to sue, no data center obstruction that will work. That's just the reality we have to front-run.
you can already gather the same information by searching online.
Do you want to know how to kill yourself? forums are for nerds. Here is wikipedia: https://en.wikipedia.org/wiki/Suicide_methods#List
Do you want to make a bomb? the first thing that came to my mind is a pressure cooker (due to news coverage). Searching "bomb with pressure cooker" yields a wikipedia article, skimming it randomly my eyes read "Step-by-step instructions for making pressure cooker bombs were published in an article titled "Make a Bomb in the Kitchen of Your Mom" in the Al-Qaeda-linked Inspire magazine in the summer of 2010, by "The AQ chef"." Searching for a mirror of the magazine we can find https://imgur.com/a/excerpts-from-inspire-magazine-issue-1-3... which has a screenshot of the instruction page. Now we can use the words in those screenshots to search for a complete issue. Here are a couple of interesting PDFs: - https://archive.org/details/Fabrica.2013/Fabrica_arabe/page/... - https://www.aclu.org/wp-content/uploads/legal-documents/25._...
the second one is quite interesting, it's some sort of legal document for nerds but from page 26 on it has what appears to be a full copy of the jihadist magazine. Remarkable exhibit.
What else do you want to know? How to make drugs? you need a watering can and a pot if you want to grow weed. want the more exotic stuff? You can find guides on reddit.
Do you also want to know how to be racist? Here are some slurs, indexed by target audience, ready for use: https://en.wikipedia.org/wiki/List_of_ethnic_slurs
these LLMs will never be able to mitigate this unless they literally scan everything all the time and nobody is gonna want that.
besides, open source models exist now
"Announcing new and improved logics service! Your logic is now equipped to give directive as well as consultive service. If you want to do something and don't know how to do it—ask your logic!"
When my brother started to study Chemisty, he was told a) that it was easy to make meth b) the profit he would make and c) that the police would no doubt catch him, as only university students would make meth so pure.
By the time he was done, he knew enough to commit mass murder in half a dusin different very hard to track ways. I am sure doctors know how to commit murder and make it look natural.
My brother never killed anyone, or made any meth. You simply cannot have it so that students don’t get this type of knowledge, without seriously compromising their education and its the same way with LLMs.
The solution is the same: punish people for their crimes, don’t punish people for wanting to know things.
I read the anarchist cookbook 40 years ago that had similar info.
I think the info has been available for many years and the thing stopping terrorists wasn’t info.
Good luck on being on the list of people using chatgpt and claude to make neurotoxins ;)
I assume anthropic and ooenai are selling prompt logs to the fbi and other countries’ law enforcement for data mining.
I tested something similar last week, still worked easily.
The problem is: Until you go out and do a mass casualty event, unless you yourself are a trained professional, no one knows what you actually did.
The knowledge is one thing. But the competence of execution and the will to act are difficult to line up.
Yes there should be safe guards, but after a while you're jumping at shadows.
I'm more worried about depressed kids getting on chat and being encouraged to kill themselves than terrorist attacks.
We know what a cancer algorithmic social media is yet we don't act.
I doubt there will be any real and serious opposition to this bill, but there should be.
Countless downloadable models (including de-aligned mainstream models) can do this.
Making knowledge illegal is a dangerous precedent. Actions should be illegal, not knowledge. Don't outlaw knowing how to make neurotoxic agents, outlaw actually trying to make them.
As for OpenAI immunity, I'm not sure I see the problem. Consider the converse position: if an OpenAI model helped someone create a cancer cure, would OpenAI see a dime of that money? If they can't benefit proportionally from their tool allowing people to achieve something good, then why should they be liable for their tool allowing people to achieve something bad.
They're positioning their tool as a utility: ultimately neutral, like electricity. That seems eminently reasonable.
You can buy books on how to make and obtain chemicals on your own.
Hell here's an Internet Archive book on making explosives
https://archive.org/details/saxon-kurt.-fireworks-explosives....
If you ever chat with older folks pre-90's much of this information was accessible fairly easily. It only changed with the push by the government to crackdown on Waco, Oklahoma City bombing, militias and other related groups. There was then a campaign to make it "normal" to limit free speech on the subjects, where as these books were available before.
I think the whole thing where AI should make information less available is a difficult battle and one which I personally oppose, but do understand. Free speech and information isn't the problem, it's the people, actions and substances they create.
After the age of the internet, I think it's been a forever loosing battle to limit information, it's why we couldn't stop cryptography, nuclear weapon proliferation, gun distribution, drug distribution, etc. The AI is just another battle ground, one which, if they actually do manage to control could definitely create some walls to this information, but not stop it.
More scary, is that the AI as it becomes pervasive and stop people from asking certain questions, because they don't know they should ask... but that's unrelated to the risk of mass death.
> neurotoxic agents from items you can get at most everyday stores
I mean, bleach and ammonia will do that. So I'm not sure that's really much of an accomplishment for AI.
While scary, information like this has been pretty accessible for 20-30 years now.
In the wild west days of the early internet, there were whole forums devoted to "stuff the government doesn't want you to know" (Temple Of The Screaming Electron, anyone?).
I suppose the friction is scariest part, every year the IQ required to end the world drops by a point, but motivated and mildly intelligent people have been able to get this info for a long time now. Execution though has still steadily required experts.