The weird thing is that this is probably a performance optimization for quick responses when a user asks a question.
My agent harness spins up a VM too, but it spins up on demand, cools down in 10 minutes and warms up when I focus back on the app.
The files it works on actually lives in a mount.
People take more time to type a prompt than the VM takes to spin up on a fast machine and on a slow machine, the cooldown naturally frees RAM back to the machine.