From the article:
> We recommend keeping task-specific instructions in separate markdown files with self-descriptive names somewhere in your project. Then, in your CLAUDE.md file, you can include a list of these files with a brief description of each, and instruct Claude to decide which (if any) are relevant and to read them before it starts working.
I've been doing this since the early days of agentic coding though I've always personally referred to it as the Table-of-Contents approach to keep the context window relatively streamlined. Here's a snippet of my CLAUDE.md file that demonstrates this approach:
# Documentation References
- When adding CSS, refer to: docs/ADDING_CSS.md
- When adding assets, refer to: docs/ADDING_ASSETS.md
- When working with user data, refer to: docs/STORAGE_MANAGER.md
Full CLAUDE.md file for reference:https://gist.github.com/scpedicini/179626cfb022452bb39eff10b...
I don't get the point. Point it at your relevent files ask it to review discuss the update refine it's understanding and then tell it to go.
I have found that more context comments and info damage quality on hard problems.
I actually for a long time now have two views for my code.
1. The raw code with no empty space or comments. 2. Code with comments
I never give the second to my LLM. The more context you give the lower it's upper end of quality becomes. This is just a habit I've picked up using LLMs every day hours a day since gpt3.5 it allows me to reach farther into extreme complexity.
I suppose I don't know what most people are using LLMs for but the higher complexity your work entails the less noise you should inject into it. It's tempting to add massive amounts of xontext but I've routinely found that fails on the higher levels of coding complexity and uniqueness. It was more apparent in earlier models newer ones will handle tons of context you just won't be able to get those upper ends of quality.
Compute to informatio ratio is all that matters. Compute is capped.
There is far much easier way to do this and one that is perfectly aligned with how these tools work.
It is called documenting your code!
Just write what this file is supposed to do in a clear concise way. It acts as a prompt, it provides much needed context specific to the file and it is used only when necessary.
Another tip is to add README.md files where possible and where it helps. What is this folder for? Nobody knows! Write a README.md file. It is not a rocket science.
What people often forget about LLMs is that they are largely trained on public information which means that nothing new needs to be invented.
You don't have to "prompt it just the right way".
What you have to do is to use the same old good best practices.
Probably a lot of people here disagree with this feeling. But my take is that if setting up all the AI infrastructure and onboarding to my code is going to take this amount of effort, then I might as well code the damn thing myself which is what I'm getting paid to (and enjoy doing anyway)
Funny how this is exactly the documentation you'd need to make it easy for a human to work with the codebase. Perhaps this'll be the greatest thing about LLMs -- they force people to write developer guides for their code. Of course, people are going to ask an LLM to write the CLAUDE.md and then it'll just be more slop...
I’m sure I’m just working like a caveman, but I simply highlight the relevant code, add it to the chat, and talk to these tools as if they were my colleagues and I’m getting pretty good results.
About 12 to 6 months ago this was not the case (with or without .md files), I was getting mainly subpar result, so I’m assuming that the models have improved a lot.
Basically, I found that they not make that much of a difference, the model is either good enough or not…
I know (or at least I suppose) that these markdown files could bring some marginal improvements, but at this point, I don’t really care.
I assume this is an unpopular take because I see so many people treat these files as if they were black magic or silver bullet that 100x their already 1000x productivity.
I was waiting for someone to build this so that I can chuck it into CLAUDE and tell it how to write good MD.
I have found enabling the codebase itself to be the “Claude.md” to be most effective. In other words, set up effective automated checks for linting, type checking, unit tests etc and tell Claude to always run these before completing a task. If the agent keeps doing something you don’t like, then a linting update or an additional test often is more effective than trying to tinker with the Claude.md file. Also, ensure docs on the codebase are up to date and tell Claude to read relevant parts when working on a task and of course update the docs for each new task. YMMV but this has worked for me.
That paper the article references is old at this point. No GPT 5.1, no Gemini 3, which both were game changers. I'd love to see their instruction following graphs.
I have Claude itself write CLAUDE.md. Once it is informed of its context (e.g., "README.md is for users, CLAUDE.md is for you") you can say things like, "update readme and claudemd" and it will do it. I find this especially useful for prompts like, "update claudemd to make absolutely certain that you check the API docs every single time before making assumptions about its behavior" — I don't need to know what magick spell will make that happen, just that it does happen.
Oh yeah I added a CLAUDE.md to my project the other day: https://github.com/grishka/Smithereen/blob/master/CLAUDE.md
Is it a good one?
I already forgot CLAUDE.md, I generate and update it by AI, I prefer to keep design, tasks, docs folder instead. It is always better to ask it to read a some spec docs and read the real code first before doing anything.
I've gotten quite a bit of utility out of my current setup[0]:
Some explicit things I found helpful: Have the agent address you as something specific! This way you know if the agent is paying attention to your detailed instructions.
Rationality, as in the stuff practiced on early Less Wrong, gives a great language for constraining the agent, and since it's read The Sequences and everything else you can include pointers and the more you do the more it will nudge it into that mode of thought.
The explicit "This is what I'm doing, this is what I expect" pattern has been hugely useful for both me monitoring it/coming back to see what it did, and it itself. It makes it more likely to recover when it goes down a bad path.
The system reminder this article mentions is definitely there but I have not noticed it messing much with adherence. I wish there were some sort of power user mode to turn it off though!
Also, this is probably too long! But I have been experimenting and iterating for a while, and this is what is working best currently. Not that I've been able to hold any other part constant -- Opus 4.5 really is remarkable.
[0]: https://gist.github.com/ctoth/d8e629209ff1d9748185b9830fa4e7...
The advice here seems to assume a single .md file with instructions for the whole project, but the AGENTS.md methodology as supported by agents like github copilot is to break out more specific AGENTS.md files in the subdirectories in your code base. I wonder how and if the tips shared change assuming a flow with a bunch of focused AGENTS.md files throughout the code.
Interesting selection of models for the "instruction count vs. accuracy" plot. Curious when that was done and why they chose those models. How well does ChatGPT 5/5.1 (and codex/mini/nano variants), Gemini 3, Claude Haiku/Sonnet/Opus 4.5, recent grok models, Kimi 2 Thinking etc (this generation of models) do?
I've been very satisfied with creating a short AGENTS.md file with the project basics, and then also including references to where to find more information / context, like a /context folder that has markdown files such as app-description.md.
"You can investigate this yourself by putting a logging proxy between the claude code CLI and the Anthropic API using ANTHROPIC_BASE_URL" I'd be eager to read a tutorial about that I never know which tool to favour for doing that when you're not a system or network expert.
Here is my take, on writing a good claude.md. I had very good results with my 3 file approach. And it has also been inspired by the great blog posts that Human Layer is publishing from time to time https://github.com/marcuspuchalla/claude-project-management
What's the actual completion rate for Advent of Code? I'd bet the majority of participants drop off before day 25, even among those aiming to complete it.
Is this intentional? Is AoC designed as an elite challenge, or is the journey more important than finishing?
None of this should be necessary if these tools did what they say on the tin, and most of this advice will probably age like milk.
Write readmes for humans, not LLMs. That's where the ball is going.
It seems overall a good set of guidelines. I appreciate some of the observations being backed up by data.
What I find most interesting is how a hierarchical / recursive context construct begins to emerge. The authors' note of "root" claude.md as well as the opening comments on LLMs being stateless ring to me like a bell. I think soon we will start seeing stateful LLMs, via clever manipulation of scope and context. Something akin to memory, as we humans perceive it.
I think this could work really well for infrastructure/ops style work where the LLM will not be able to grasp the full context of say the network from just a few files that you have open.
But as others are saying this is just basic documentation that should be done anyway.
I have been using Claude.md to stuff way too many instructions so this article was an eye opener. Btw, any tips for Claude.md when one uses subagents?
Ha, I just tell Claude to write it. My results have been generally fine, but I only use Claude on a simple codebase that is well documented already. Maybe I will hand-edit it to see if I can see any improvements.
I was expecting the traditional AI-written slop about AI, but this is actually really good. In particular, the "As instruction count increases, instruction-following quality decreases uniformly" section and associated graph is truly fantastic! To my mind, the ability to follow long lists of rules is one of the most obvious ways that virtually all AI models fail today. That's why I think that graph is so useful -- I've never seen someone go and systematically measure it before!
I would love to see it extended to show Codex, which to my mind is by far the best at rule-following. (I'd also be curious to see how Gemini 3 performs.)
Is CLAUDE.md required when claude has a --continue option?
It would be nice to see an actual example of what a good claude.md that implements all of these recommendations looks like.
The only good Claude.md is a deleted Claude.md.
"Here's how to use the slop machine better" is such a ridiculous pretense for a blog or article. You simply write a sentence and it approximates it. That is hardly worth any literature being written as it is so self obvious.
> Claude often ignores CLAUDE.md
> The more information you have in the file that's not universally applicable to the tasks you have it working on, the more likely it is that Claude will ignore your instructions in the file
Claude.md files can get pretty long, and many times Claude Code just stops following a lot of the directions specified in the file
A friend of mine tells Claude to always address him as “Mr Tinkleberry”, he says he can tell Claude is not paying attention to the instructions on Claude.md, when Claude stops calling him “Mr Tinkleberry” consistently