I use shellbox.dev to create sandboxes through ssh, without ever leaving the terminal
The sandbox-or-not debate is important but it's only half the picture. Even a perfectly sandboxed agent can still generate code with vulnerabilities that get deployed to production - SQL injection, path traversal, hardcoded secrets, overly permissive package imports.
The execution sandbox stops the agent from breaking out during development, but the real risk is what gets shipped downstream. Seeing more tools now that scan the generated code itself, not just contain the execution environment.
That is quite an involved setup to get a costly autocomplete going.
Is that really where we are at? Just outsource convenience to a few big players that can afford the hardware? Just to save on typing and god forbid…thinking?
“Sorry boss, I can’t write code because cloudflare is down.”
I was looking for a way to isolate my agents in a more convenient way, and I really love your idea. I'm going to give this a try over the weekend and will report back.
But the one-time setup seems like a really fair investment for having a more secure development. Of course, what concerns the problem of getting malicious code to production, this will not help. But this will, with a little overhead, I think, really make development locally much more secure.
And you can automate it a lot. And it will be finally my chance to get more into NixOS :D
A pair of containers felt a bit cheaper than a VM:
https://github.com/5L-Labs/amp_in_a_box
I was going to add Gemini / OpenCode Kilo next.
There is some upfront cost to define what endpoints to map inside, but it definitely adds a veneer of preventing the crazy…
Couldn't you replicate all of your setup with qemu microvm?
Without nix I mean
[dead]
we run ~10k agent pods on k3s and went with gvisor over microvms purely for density. the memory overhead of a dedicated kernel per tenant just doesn't scale when you're trying to pack thousands of instances onto a few nodes. strict network policies and pid limits cover most of the isolation gaps anyway.
I there a way to make this work with macOS hosts, preferably without having to install a Linux toolchain inside the VM for the language the agent will be writing code in?