I created a test evaluation (they friggen' stole the word harness) that runs a changed prompt c...

dataviz1000 • today at 1:48 PM • 0 replies • view on HN

I created a test evaluation (they friggen' stole the word harness) that runs a changed prompt comparing success pass / fail, the number of tokens and time of any change. It is an easy thing to do. The best part is I set up an orchestration pattern where one agent iterations updating the target agent prompts. Not only can it evaluate the outcome after the changes, it can update and rerun self-healing and fixing itself.

alt Hacker News