I have been working on a Beads alternative because of two reasons:
1) I didnt like that Beads was married to git via git hooks, and this exact problem.
2) Claude would just close tasks without any validation steps.
So I made my own that uses SQLite and introduced what I call gates. Every task must have a gate, gates can be reused, task <-> gate relationships are unique so a previous passed gate isnt passed if you reuse it for a new task.
I havent seen it bypass the gates yet, usually tells me it cant close a ticket.
A gate in my design is anything. It can be as simple as having the agent build the project, or run unit tests, or even ask a human to test.
Seems to me like everyones building tooling to make coding agents more effective and efficient.
I do wonder if we need a complete spec for coding agents thats generic, and maybe includes this too. Anthropic seems to my knowledge to be the only ones who publicly publish specs for coding agents.
Great minds... I built my own memory harness, called "Argonaut," to move beyond what I thought were Beads' limitations, too. (shoutout to Yegge, tho - rad work)
Regarding your point on standards... that's exactly why I built AAP and AIP. They're extensions to Google's A2A protocol that are extremely easy to deploy (protocol, hosted, self-hosted).
It seemed to me that building this for my own agents was only solving a small part of the big problem. I need observability, transparency, and trust for my own teams, but even more, I need runtime contract negotiation and pre-flight alignment understanding so my teams can work with other teams (1p and 3p).