I can't point you to a good complete documentation, because the field is changing very fast as people make new discoveries.
I learned by reading articles, success stories failure stories and mostly by doing, trying stuff, see how it works and adjusting it and burning a lot of tokens along the way.
What I would do in your shoes, I would ask an AI chat to find new articles on the matter (including on HN), explain how Codex, Claude, Pi are managing agents.
My compressed view is: you need to have a great specification both business and architecture wise that doesn't leave anything important for the model to guess because chances are it will make the wrong choices. That comprehensive spec should not be in one huge chunk. Have your plan divided in phases that each fit in a context window and have the spec for each phase. Use TDD, strive for 100% coverage. Force the model to behave: if it doesn't do what is supposed to, give it feedback and ask it to retry and don't allow it to progress to the next stage unless everything is perfect. I also like to write comprehensive integration tests before building anything. The agents are not allowed to touch or read the integration tests, only run them and they will get feedback where the tests fail. I like to build the integration tests in a different language than the software I am building, to make sure there isn't something platform specific that the tests rely on. I use C#, Go, Rust and Zig for development and Python for the integration tests.
For now, to get good results, I can't just copy and paste the setup from a project to another, I have to work a lot to tailor the process for each new codebase.
And that's why I am working on an agent harness to try to force the agents to do the right things in most common development scenarios without wasting much tokens. By common development scenarios I mean that is a large goal, right now I am working towards backend web development and microservices.