I have completely different experience. Which models are you talking about? I have no trouble at all with AI documenting the steps it took. I use codex gpt5.4 and Claude code opus 4.6 daily. When needed - they have no issue with describing what steps they took, what were the problems during the run. Documenting that all as a SKILL, then reuse and fix instructions on further feedback.
The steps they say they took and steps they took are not the same thing.
I use mainly Opus 4.6.
I did the same thing and created a skill for summarizing a troubleshooting conversation. It works decently, as long as my own input in the troubleshooting is minimal. i.e. dangerously-skip-permissions. As soon as I need to take manual steps or especially if the conversation is in Desktop/Web, it will very quickly degrade and just assume steps I've taken (e.g. if it gave me two options to fix something, and I come back saying it's fixed, it will in the summary just kind of randomly decide a solution). It also generally doesn't consider the previous state of the system (e.g. what was already installed/configured/setup) when writing such a summary, which maybe makes it reusable for me, somewhat, but certainly not for others.
Now you could say, "these are all things you can prompt away", and, I mean, to an extent, probably. But once you're talking about taking something like this online, you're not working with the top 1% proompters. The average claude session is not the diligent little worker bee you'd want it to be. These models are still, at their core, chaos goblins. I think Moltbook showed that quite clearly.
I think having your model consider someone else's "fix" to your problem as a primary source is bad. Period. Maybe it won't be bad in 3 generations when models can distinguish noise and nonsense from useful information, but they really can't right now.