logoalt Hacker News

jameslktoday at 3:01 AM1 replyview on HN

> I queued the work and let it run. First task came back good. Second came back good. Somewhere around hour four the quality started sliding. By hour six the agent was cutting corners I’d specifically told it not to cut, skipping steps I’d explicitly listed, behaving like I’d never written any of the rules down.

> …

> When I write a prompt, the agent doesn’t just read the words. It reads the shape. A short casual question gets read as casual. A long precise document with numbered rules gets read as… not just the rules, but also as a signal. “The user felt the need to write this much.” “Why?” “What’s going on here?” “What do they really want?”

This is an interesting premise but based on the information supplied, I don’t think it’s the only conclusion. Yet the whole essay seems to assume it is true and then builds its arguments on top of it.

I’ve run into this dilemma before. It happens when there’s a TON of information in the context. LLMs start to lose their attention to all the details when there’s a lot of it (e.g. context rot[0]). LLMs also keep making the same mistakes once the information is in the prompt, regardless of attempts to convey it is undesired[1]

I think these issues are just as viable to explain what the author was facing. Unless this is happening with much less information

0. https://www.trychroma.com/research/context-rot

1. https://arxiv.org/html/2602.07338v1


Replies

perrygeotoday at 3:31 AM

It's more than context-rot.

If you ask a vague ignorant question, you get back authoritative summaries. If you make specific request, each statement is taken literally. The quality of the answer depends on the quality of the question.

And I'm not using "quality" to mean good/bad. I mean literally qualitative, not quantifiable. Tone. Affect. Personality. Whatever you call it. Your input tokens shape the pattern of the output tokens. It's a model of human language, is that really so surprising?