logoalt Hacker News

csohamyesterday at 1:59 PM1 replyview on HN

Really intresting. What did the original prompt look like? Perhaps the original prompt was not that good? I feel like the changes claude suggested (except a couple maybe) are already pretty well known prompt engineering practices.


Replies

blndrtyesterday at 2:32 PM

Thank you for the feedback!

In this (telecom) benchmark you can review agent policies and manuals here: 1) https://github.com/sierra-research/tau2-bench/blob/main/data... 2) https://github.com/sierra-research/tau2-bench/blob/main/data...

Of course these are just parts of the prompt, you can inspect benchamark code to see how these are rendered to actual LLM calls.

In case someone is not familiar with framework methodology I've wrote a separate article covering that (with some of my thoughts) -> https://quesma.com/blog/tau2-from-llm-benchmark-to-blueprint...