I'd guess the same has always been true for READMEs / human dev docs. Of course it doesn't transfer directly but still feels incredible to be in an age where we can measure such (previously) theoretical things with synthetic programmers.
the harness (skills, context, memory, state and past decision, implementation history etc) should live in your repo so that you can freely switch IDE/CLIs and models. full protability. don't let OpenAI or Anthropic own your work. https://recursive-mode.dev/introduction
Most of my projects are without an AGENTS.md/CLAUDE.md at the moment. I've found that if the project itself is in good shape - clear docs, comprehensive tests - you don't need to tell the coding agent much in order for it to be productive.
I start a whole lot of my sessions with "Run tests with 'uv run pytest'" and once they've done that they get the idea that they should write tests in a style that fits the existing ones.
I suspect the harness (of which AGENTS and skills and similar things) should be abstracted for better overall performance. This article doesn't really go into detail about model preferences, but some other benchmarks show that different models have differnt preferences of how to use certain tools (probably related to their post training material), and it should really be managed invisibly to me as the end user.
Also curious how well LLMs can self-reflect in a loop, in terms of, here's how the previous iteration went, here's what didn't go well, here's feedback from the human, how do I modify the docs I use in a way that I know I'll do better next time.
I know you can somewhat hillclimb via DSPy but that's hard to generalize.
It's cool that they did some measurements, but unfortunately there's not much to learn from the article unless you're using really outdated files that you wrote by hand. The agent should know how to write a good file.
For existing files, the agent will carry on a bad structure unless you specifically ask it to refactor and think about what's actually helpful.
In general, it should be a lean file that tells the agent how to work with the project (short description, table of commands, index of key docs, supporting infra, handful of high-level rules and conventions that apply to everything). Occasionally ask the agent to review and optimize the file, particularly after model upgrades.
Interesting that they had a 100% read rate of agents.md. In my test repo lower down agents.md files were occasionally missed by vscode copilot. That fact put me off putting too much effort into nesting agents.md files too much within the repo and I've been focusing on agent skills instead.
I made up an attempt at a solution, https://ktext.dev.
Basically a structured context file, that can be used to generate AGENTS.md, and also can be validated and scored.
I think it could help with this problem.
IME, multiple (good) AGENTS.md is even better. I mostly see them only at the root of a repository, but I spread more out into important subdirectories. They act as a table of contents and spark notes. Putting more focussed AGENTS.md in important places has been even more helpful.
Bonus points if you can force them into context without needing the agent to make a tool call, based on touching the files or systems near them. (my homegrown agent has this feature)
Will people ever get tired of writing AI how-to slop?
The models are so terrible you have to think ahead of them so they don't make mistakes. This is not an upgrade. This is coping behavior.
I think the main thing which a lot of these articles miss is it's not just your Agents.md which can give you a model upgrade or the inverse.
But everything your harness looks at could be this. So the skills in your code base, the commands that you've added, the memories that were auto created, they all work towards improving or completely destroying your productivity.
And most of it is hidden. You hear people talk about this all the time where they'll be like, Oh, I use GSD or I use Superpowers and my results have gotten worse.
Your results might have gotten worse precisely because you use them (along with your memories and other skills).