I guess this really depends on the problem but from the PromptWizard (PW) paper:
| Approach | API calls | IO Tokens | Total tokens | Cost ($) |
|----------|-----------|-----------|---------------|----------|
| Instinct | 1730 | 67 | 115910 | 0.23 |
| InsZero | 18600 | 80 | 1488000 | 2.9 |
| PB | 5000 | 80 | 400000 | 0.8 |
| EvoP | 69 | 362 | 24978 | 0.05 |
| PW | 69 | 362 | 24978 | 0.05 |
They ascribe this gain in efficiency to a balance between exploration and exploitation that involves a first phase of instructions mutation followed by a phase where both instruction and few-shot examples are optimized at the same time. They also rely on "textual gradients", namely criticism enhanced by CoT, as well as synthesizing examples and counter-examples.What I gathered from reading those papers + some more is that textual feedback, i.e. using a LLM to reason about how to carry out a step of the optimization process is what allows to give structure to the search space.
Super interesting.
I will have to read it - I will be looking to figure out if the tasks that they are working on significant/realistic? And are the improvements that they are finding robust?