I've been thinking a lot about the primitives we should be reaching for to standardize these agent systems. There seems to be a degree of isomorphism between skills, MCP, and AGENTS.md. Shell and apply seem pretty fundamental due to the UNIX legacy. Happy to see sandboxing finally included as a first-class concept!
I wouldn't something a bit deeper, though, like a standardization around tokenization that could allow for some extensibility.
The separation of harness from compute is the right architectural move. The part that's still missing from most agent frameworks is the verification layer between steps. Sandbox execution solves the safety problem. It doesn't solve the accuracy problem. Those are different failure modes that need different infrastructure.