I do something similar, but across three doc types: design, plan, and debug
Design works similar to your project.md file, but on a per feature request. I also explicitly ask it to outline open questions/unknowns.
Once the design doc (i.e. design/[feature].md) has been sufficiently iterated on, we move to the plan doc(s).
The plan docs are structured like `plan/[feature]/phase-N-[description].md`
From here, the agent iterates until the plan is "done" only stopping if it encounters some build/install/run limitation.
At this point, I either jump back to new design/plan files, or dive into the debug flow. Similar to the plan prompting, debug is instructed to review the current implementation, and outline N-M hypotheses for what could be wrong.
We review these hypotheses, sometimes iterate, and then tackle them one by one.
An important note for debug flows, similar to manual debugging, it's often better to have the agent instrument logging/traces/etc. to confirm a hypothesis, before moving directly to a fix.
Using this method has led to a 100% vibe-coded success rate both on greenfield and legacy projects.
Note: my main complaint is the sheer number of markdown files over time, but I haven't gotten around to (or needed to) automate this yet, as sometimes these historic planning/debug files are useful for future changes.
> At this point, I either jump back to new design/plan files, or dive into the debug flow. Similar to the plan prompting, debug is instructed to review the current implementation, and outline N-M hypotheses for what could be wrong.
I'm biased because my company makes a durable execution library, but I'm super excited about the debug workflow we recently enabled when we launched both a skill and MCP server.
You can use the skill to tell your agent to build with durable execution (and it does a pretty great job the first time in most cases) and then you can use the MCP server to say things like "look at the failed workflows and find the bug". And since it has actual checkpoints from production runs, it can zero in on the bug a lot quicker.
We just dropped a blog post about it: https://www.dbos.dev/blog/mcp-agent-for-durable-workflows
> Note: my main complaint is the sheer number of markdown files over time, but I haven't gotten around to (or needed to) automate this yet, as sometimes these historic planning/debug files are useful for future changes.
FWIW, what you describe maps well to Beads. Your directory structure becomes dependencies between issues, and/or parent/children issue relationship and/or labels ("epic", "feature", "bug", etc). Your markdown moves from files to issue entries hidden away in a JSONL file with local DB as cache.
Your current file-system "UI" vs Beads command line UI is obviously a big difference.
Beads provides a kind of conceptual bottleneck which I think helps when using with LLMs. Beads more self-documenting while a file-system can be "anything".
My "heavy" workflow for large changes is basically as follows:
0. create a .gitignored directory where agents can keep docs. Every project deserves one of these, not just for LLMs, but also for logs, random JSON responses you captured to a file etc.
1. Ask the agent to create a file for the change, rephrase the prompt in its own words. My prompts are super sloppy, full of typos, with 0 emphasis put on good grammar, so it's a good first step to make sure the agent understands what I want it to do. It also helps preserve the prompt across sessions.
2. Ask the agent to do research on the relevant subsystems and dump it to the change doc. This is to confirm that the agent correctly understands what the code is doing and isn't missing any assumptions. If something goes wrong here, it's a good opportunity to refactor or add comments to make future mistakes less likely.
3. Spec out behavior (UI, CLI etc). The agent is allowed to ask for decisions here.
4. Given the functional spec, figure out the technical architecture, same workflow as above.
5. High-level plan.
6. Detailed plan for the first incomplete high-level step.
7. Implement, manually review code until satisfied.
8. Go to 6.