logoalt Hacker News

vanillameowtoday at 7:45 AM2 repliesview on HN

I'm surprised to see this getting so much positive reception. In my experience AI is still really bad with documenting the exact steps it took, much more so when those are dependent on its environment, and once there's a human in the loop at any point you can completely throw the idea out the window. The AI will just hallucinate intermediate steps that you may or may not have taken unless you spell out in exact detail every step you took.

People in general seem super obsessed with AI context, bordering on psychosis. Even setting aside obvious examples like Gas Town or OpenClaw or that tweet I saw the other day of someone putting their agents in scrum meetings (lol?), this is exactly the kind of vague LLM "half-truth" documentation that will cascade into errors down the line. In my experience, AI works best when the ONLY thing it has access to is GROUND TRUTH HUMAN VERIFIED documentation (and a bunch of shell tools obviously).

Nevertheless it'll be interesting to see how this turns out, prompt injection vectors and all. Hope this doesn't have an admin API key in the frontend like Moltbook.


Replies

bonoboTPtoday at 10:53 AM

That can happen if the history got compacted away in a long session. But usually AI agents also have a way to re-read the entire log from the disk. Eg Claude Code stores all user messages, LLM messages and thinking traces, tool calls etc in json files that the agent can query. In my experience it can do it very well. But the AI might not reach for those logs unless asked directly. I can see that it could be more proactive but this is certainly not some fundamental AI limitation.

latand6today at 8:05 AM

I have completely different experience. Which models are you talking about? I have no trouble at all with AI documenting the steps it took. I use codex gpt5.4 and Claude code opus 4.6 daily. When needed - they have no issue with describing what steps they took, what were the problems during the run. Documenting that all as a SKILL, then reuse and fix instructions on further feedback.

show 2 replies