logoalt Hacker News

skybriantoday at 12:35 AM7 repliesview on HN

For a coding agent, the project "learns" as you improve its onboarding docs (AGENTS.md), code, and tests. If you assume you're going to start a new conversation for each task and the LLM is a temp that's going to start from scratch, you'll have a better time.


Replies

stingraycharlestoday at 7:41 AM

But these docs are the notes, it constantly needs to be re-primed with them, an approach which doesn’t scale. How much of this knowledge can you really put in these agent docs? There's only so much you can do, and for any serious-scale projects, there's SO much knowledge that needs to be captured. Not just "do this, do that", but also context about why certain decisions were made (rationale, business context, etc).

It is exactly akin to a human that has to write down everything on notes, and re-read them every time.

show 1 reply
falcor84today at 12:46 AM

But that's the thing: Claude Plays Pokemon is an experiment in having Claude work fully independently, so there's no "you" who would improve its onboarding docs or anything else, it has to do so on its own. And as long as it cannot do so reliably, it effectively has anterograde amnesia.

And just to be clear, I'm mentioning this because I think that Claude Plays Pokemon is a playground for any agentic AI doing any sort of long-term independent work; I believe that the solution needed here is going to bring us closer to a fully independent agent in coding and other domains. It reminds me of the codeclash.ai benchmark, where similar issues are seen across multiple "rounds" of an AI working on the same codebase.

show 2 replies
PunchyHamstertoday at 1:50 PM

mfw people do better documentation for AI than for other people in the project

skerittoday at 10:20 AM

I agree, I started doing something like that a while ago.

I've had great success using Claude Opus 4.5, as long as I hold its hand very tightly.

Constantly updating the CLAUDE.md file, adding an FAQ to my prompts, making sure it remembers what it tried before and what the outcome was. It became a lot more productive after I started doing this.

Using the "main" agent as an orchestrator, and making it do any useful work or research in subagents, has also really helped to make useful sessions last much longer, because as soon as that context fills up you have to start over.

Compaction is fucking useless. It tries to condense +/- 160.000 tokens into a few thousand tokens, and for anything a bit complex this won't work. So my "compaction" is very manual: I keep track of most of the things it has said during the session and what resulted from that. So it reads a lot more like a transcript of the session, without _any_ of the actual tool call results. And this has worked surprisingly well.

In the past I've tried various ways of automating this process, but it's never really turned out great. And none of the LLMs are good at writing _truly_ useful notes.

kaashiftoday at 12:44 AM

Yeah but it feels terrible. I put as much as I can into Claude skills and CLAUDE.md but the fact that this is something I even have to think about makes me sad. The discrete points where the context gets compacted really feel bad and not like how I think AGI or whatever should work.

Just continuously learn and have a super duper massive memory. Maybe I just need a bazillion GPUs to myself to get that.

But no-one wants to manage context all the time, it's incidental complexity.

show 2 replies
formerly_proventoday at 10:28 AM

The way amp does this explicitly with threads and hand-offs (and of course the capability to summarize/fetch parts of other threads on demand as opposed to eagerly, like compaction essentially tries to do) makes imho a ton of sense for the way LLMs currently work. "Infinite scroll but not actually" is an inferior approach. I'm surprised others aren't replicating this approach; it's easy to understand, simple to implement and works well.