I really feel this bit:
> With agentic coding, part of what makes the models work today is knowing the mistakes. If you steer it back to an earlier state, you want the tool to remember what went wrong. There is, for lack of a better word, value in failures. As humans we might also benefit from knowing the paths that did not lead us anywhere, but for machines this is critical information. You notice this when you are trying to compress the conversation history. Discarding the paths that led you astray means that the model will try the same mistakes again.
I've been trying to find the best ways to record and publish my coding agent sessions so I can link to them in commit messages, because increasingly the work I do IS those agent sessions.
Claude Code defaults to expiring those records after 30 days! Here's how to turn that off: https://simonwillison.net/2025/Oct/22/claude-code-logs/
I share most of my coding agent sessions through copying and pasting my terminal session like this: https://gistpreview.github.io/?9b48fd3f8b99a204ba2180af785c8... - via this tool: https://simonwillison.net/2025/Oct/23/claude-code-for-web-vi...
Recently been building new timeline sharing tools that render the session logs directly - here's my Codex CLI one (showing the transcript from when I built it): https://tools.simonwillison.net/codex-timeline?url=https%3A%...
And my similar tool for Claude Code: https://tools.simonwillison.net/claude-code-timeline?url=htt...
What I really want it first class support for this from the coding agent tools themselves. Give me a "share a link to this session" button!
Over time, do you think this process could lock you into an inflexible state?
I'm reminded of the trade off between automation and manual work. Automation crystalizes process, and thus the system as a whole loses it's ability to adapt in a dynamic environment.
Simon, I keep hoping that you will do one of your excellent reviews on Amp. It feels like the one 'major' agentic coding tool that is still flying under the radar. I intend to explore it myself of course but curious your take.
Yes! 100% this. I was talking to friends about this and there's gotta be some value in the sessions leading to the commit. I doubt a human would them all while reviewing a PR, but some RAG tool could and then provide more context to another agent or session. Sometimes in a session I like to talk about previous commits and PRs and sessions, and I just wish this all was automatically done.
You can export all agent traces to otel, either directly or via output logging. Then just dump it in clickhouse with metadata such as repo, git user, cwd, etc.
You can do evals and give agents long term memory with the exact same infrastructure a lot of people already have to manage ops. No need to retool, just use what's available properly.
I think we already have the tools but no the communication between those? Instead of having actions taken and failures as commit messages, you should have wide-events like logs with all the context, failures, tools used, steps taken... Those logs could be used as checkpoints to go back as well and you could refer back to the specific action ID you walked back to when encountering an error.
In turn, this could all be plain-text and be made accessible, through version control in a repo or in a central logging platform.
Emacs gptel just produces md or org files.
Of course the agentic capabilities are very much on a roll-your-own-in-elisp basis.
I’d like to make something like this but in the background. So I can better search my history of sessions. Basically start creating my own knowledge base of sorts
there's some research into context layering so you can split / reuse previous chunks of context
ps: your context log apps are very very fun
> There is, for lack of a better word, value in failures
Learning? Isn't that what these things are supposedly doing?
Checkout codecast.sh
"all my losses is lessons"
When I find myself in a situation where I’ve been hammering an LLM and it keeps veering down unproductive paths - trying poor solutions or applying fixes that make no difference but eventually we do arrive at the correct answer, the result is often a massive 100+ KB running context.
To help mitigate this in the future I'll often prompt:
Then I follow up with: I then add this summary to either the relevant MD file (CHANGING_CSS_LAYOUTS.md, DATA_PERSISTENCE.md, etc) or more generally to the DISCOVERIES.md file, which is linked from my CLAUDE.md under: I don't think linking to an entire commit full of errors/failures is necessarily a good idea - feels like it would quickly lead to the proverbial poisoning of the well.