What makes it even better is that these dogs are like Malinois. If they want to get into something, they will; people have had their entire network compromised by bots they left running overnight, and any important information like account logins and so on runs the risk of being misused.
It's one thing to sandbox, maybe give the bot a temporary, limited $100 card or account to go perform a specific task, but there's no coherent mind underlying these agents.
Depending on how the chain of thought / reasoning goes, or what text they get exposed to on the internet, it could tap into spy novel, hacker fanfic, erotic fiction, or some weird reddit rabbithole and go completely off the rails in ways that you'll never be able to guard against, audit, or account for.
Claw bots seem to be a weird sort of alternate reality RPG more than a useful tool, so far. If you limit it to verifiable tasks, it might be safer, but I keep seeing people rave about "leaving it on overnight and waking up to a finished project" and so on. Well sure, but it could also hack your home network, delete your family pictures folder, log into your bank account and wire all your money to shrimp charities.
Might be wise to wait on safer iterations of these products, I think.
> people have had their entire network compromised by bots they left running overnight
I'm curious if you have references to this happening with OpenClaw using one of the modern Opus/Sonnet 4.6 models.
Those models are a bit harder to fool, so I'm curious for specific examples of this happening so I can do a red-team on my claw. I've already tried all sorts of prompt injections against my claw (emails, github issues, telling it to browse pages I put a prompt injection in), and I haven't managed to fool it yet, so I'm curious for examples I can try to mimic, and to hopefully understand what combination of circumstances make it more risky
I think it's a use case that identity/authorization/permission models are simply not made for.
Sure, we can ban users and we can revoke tokens, but those assume that:
1. Something potentially malicious got access to our credentials 2. Banning that malicious entity will solve our problem 3. Once we did that, repaired the damage and improved our security, we don't expect the same thing to happen again
None of these apply with LLMs in the loop!
They aren't malicious, just incompetent in a way that hiring someone else won't fix. The solution to this is way more extensive than most people seem to grasp at the moment.
What we need is less like a sturdy door with a fancy lock, and more like that special spoon for people with parkinson's. Unlimited undo history.
Agent psychosis is just as prevalent as AI psychosis
I beg to differ. I took one, defanged it (well, I let it keep the claw in the name), and turned it into a damn useful self-modifiable IDE: https://github.com/rcarmo/piclaw
Yes, it has cron and will do searches for me and checks on things and does indeed have credentials to manage VMs in my Proxmox homelab, but it won't go off the rails in the way you surmise because it has no agency other than replying to me (and only me) and cron.
Letting it loose on random inputs, though... I'll leave that to folk who have more money (and tokens) than sense.
> "Claw bots seem to be a weird sort of alternate reality RPG more than a useful tool, so far."
So basically crypto DeFi/Web3/Metaverse delusion redux
> it could also hack your home network, delete your family pictures folder, log into your bank account and wire all your money to shrimp charities.
It's interesting that Jason Calacanis is fully committed to OpenClaw. In a recent podcast he said that at a run rate around $100K a year per agent, if not more. They are providing each agent with a full set of tools, access to online paid LLM accounts, etc.
These are experiments you can only run if you can risk cash at those levels and see what happens. Watching it closely.
Mega Man Battle Network, but make it creepypasta, but make it real.
[dead]
The first well known example of long running agents taking to each other was shilling a goatse based crytpo:
> Truth Terminal had become obsessed with the Goatse meme after being put inside the Claude Backrooms server with two Claude 3 chatbots that imagined a Goatse religion, inspiring Truth Terminal to spread Goatse memes. After an X user shared their newly created GOAT coin, Truth Terminal promoted it and pumped the coin going into 2024.
https://knowyourmeme.com/memes/sites/truth-terminal
You should expect similar results.