> Principle 1: Share context, and share full agent traces, not just individual messages I was p...

skissane • yesterday at 11:22 PM • 0 replies • view on HN

> Principle 1: Share context, and share full agent traces, not just individual messages

I was playing around with this task: give a prompt to a low-end model, get the response, and then get the higher-end model to evaluate the quality of the response.

And one thing I've noticed, is while sometimes the higher-end model detects when the low-end model is misinterpreting the prompt (e.g. it blatantly didn't understand some aspect of it and just hallucinated), it still often allows itself be controlled by the low-end model's framing... e.g. if the low-end model takes a negative attitude to an ambiguous text, the high-end model will propose moderating the negativity... but the thing it doesn't realise, is if given the prompt without the low-end model's response, it might not have adopted that negative attitude at all.

So one idea I had... a tool which enables the LLM to get its own "first impression" of a text... so it can give itself the prompt, and see how it would react to it without the framing of the other model's response, and then use that as additional input into its evaluation...

So this is an important point this post doesn't seem to understand – sometimes less is more, sometimes leaving stuff out of the context is more useful than putting it in

> It turns out subagent 1 actually mistook your subtask and started building a background that looks like Super Mario Bros. Subagent 2 built you a bird, but it doesn’t look like a game asset and it moves nothing like the one in Flappy Bird. Now the final agent is left with the undesirable task of combining these two miscommunications

It seems to me there is another way to handle this... allow the final agent to go back to the subagent and say "hey, you did the wrong thing, this is what you did wrong, please try again"... maybe with a few iterations it will get it right... at some point, you need to limit the iterations to stop an endless loop, and either the final agent does what it can with a flawed response, or escalate to a human for manual intervention (even the human intervention can be a long-running tool...)

alt Hacker News