logoalt Hacker News

lanstintoday at 6:54 PM1 replyview on HN

Your context isn’t to give it orders, they just don’t work like that. Your context (AGENTS.me, skills, per-request context we are sending in for each request to bots) is to give it the info it needs in the language category it’s trained for the answers you want; you have to give it a clear instruction each prompt. Basically, when you have a long session, you can see this by saying, ok, now moving onto another thing, blah blah blah (implicitly ignoring all previous instructions). It can even back fire - nagging too much about don’t skip tests in the context can make it slip into the linguistic space where there is some emergency and faking the results might be justified (I imagine there is a certain amount of training out there “just making the tests pass for now, will fix later, I promise.” If you rarely mention tests except “this one is failing, please investigate what is going on” (an informational outcome not a test outcome), it doesn’t really “cheat” (tho it can leap to conclusions as always). The tests need to be some deterministic step in the process anyways, tests don’t need fuzzy word directed search capabilities. But the models just don’t have the structure to allow feeding in a ten page set of rules and follow them. You can add a step to say, please check this git commit for compliance with the 23 rules in this standards file, and it will work better to catch the gaps.


Replies

zamalektoday at 9:01 PM

> Basically, when you have a long session, you can see this by saying, ok, now moving onto another thing, blah blah blah

I try to avoid > 200k contexts, as the 1M context is where I first saw the massive decrease in reliability.

And my AGENTS is really short, and I said it was ignoring decisions in the prompt.