Has anyone run a study on how long you can run an agent as root before irreparable damage is done to the VM? A sort of gambler's ruin for the YOLO LLM Age.
I gave Sonnet 4.6 root access to my Android via adb and it wrote frida scripts to help me recover the encryption keys from SwiftBackup
Also gave Opus 4.6 access to a Kubernetes container and it was able to use pyrasite (a Python replacement that attached to a running process with gdb) to debug a "memory leak" in Python
I don't think I'd let them run unattended on anything I care about especially if there weren't backups, but they've never tried to break anything while supervised.
Usually it's significantly faster and more accurate to give the LLM/harness access to the thing to debug then to try to copy/paste back and forth.
https://forums.macrumors.com/threads/screw-it-lets-make-clau...
For me, it took a bit over six weeks of Claude running unattended perpetually.