logoalt Hacker News

sublimefireyesterday at 8:14 PM1 replyview on HN

My experience as well.

Prompt changes affect output substantially (just look up arxiv), the difficult part is find an optimal structure to yield the best results. It is a bit expensive to do a lot of testing on your own, so it all boils down to feels and experience at the moment. Then you mix up tool calls, other agent calls, client functions and this gets terribly hard to evaluate.

I am still puzzled how distance between policies can have an effect on the output. And how a simple retry fixes everything.


Replies

thesehandsyesterday at 8:25 PM

This is very much what dspy aims to address. Learning the incantations necessary to prompt well can be replaced by an algorithmic loop and example labelled cases.