There are all kinds of memory hacks, tools that index your code, etc.
The thing I have found that makes things work much better is, wait for it... Jira.
Everyone loves to hate on Jira, but it is a mature platform for managing large projects.
First, I use the Jira Rovo MCP (or cli, I don't wanna argue about that) to have Claude Code plan and document my architecture, features, etc. I then manually review and edit all of these items. Then, in a clean session, or many, have it implement, document decisions in comments etc. Everything works so much more reliably for large-ish projects like this.
When I first started doing this in my solo projects it was a major, "well, yeah, duh," moment. You wouldn't ask a human dev to magically have an entire project in their mind, why ask a coding agent to do that? This mental model has really helped me use the tools correctly.
edit: then there is context window management. I use Opus 4.6 1M all the time, but if I get much past 250k usage, that means I have done a poor job in starting new sessions. I never hit the auto-compact state. It is a universal truth that LLMs get dumb the more context you give them.
I think everyone should implement the context status bar config to keep an eye on usage:
Doesn't require Jira but yes, specification-first is the way to get better (albeit still not reliably good) results out of AI tools. Some people may call this "design-first" or "architecture-first". The point is really to think through what is being built before asking AI to write the implementation (i.e. code), and to review the code to make sure it matches the intended design.
Most people run into problems (with or without AI) when they write code without knowing what they're trying to create. Sometimes that's useful and fun and even necessary, to explore a problem space or toy with ideas. But eventually you have to settle on a design and implement it - or just end up with an unmaintainable mess of code (whether it's pure-human or AI-assisted mess doesn't matter lol).
[dead]
But even spec-first, using opus4.6 with plan, the output is merely good, and not great. It isn't bad though, and the fixes are often minors, but you _have_ to read the output to keep the quality decent. Notably, I found that LLM dislike removing code that doesn't serve active purpose. Completely dead code, that they remove, but if the dead code have tests that still call it, it stays.
And small quality stuff. Just yesterday it used a static method where a class method was optimal. A lot of very small stuff I used to call my juniors on during reviews.
On another hand, it used an elegant trick to make the code more readable, but failed to use the same trick elsewhere for no reason. I'm not saying it's bad: I probably wouldn't have thought about it by myself, and kept the worse solution. But even when Claude is smarter than I am, I still have to overview it.
(All the discourse around AI did wonder for my imposter syndrome though)