This is exactly what I'm doing. Some papers I'm studying:
TextGrad: Automatic "Differentiation" via Text: https://arxiv.org/abs/2406.07496
LLM-AutoDiff: Auto-Differentiate Any LLM Workflow : https://arxiv.org/abs/2501.16673
Trace is the Next AutoDiff: Generative Optimization with Rich Feedback, Execution Traces, and LLMs: https://arxiv.org/abs/2406.16218
GReaTer: Gradients over Reasoning Makes Smaller Language Models Strong Prompt Optimizers: https://arxiv.org/abs/2412.09722
PromptWizard: Task-Aware Prompt Optimization Framework: https://arxiv.org/abs/2405.18369
I was trying to pick n-shot examples from a data set. The idea was that given 1000s of examples for a prompt finding a combination of n that was optimal could be advantageous, but for n's that are large then bruteforcing the combincation would be impossible... so can we find an optimal set with an efficient search?
But the problem was that the search space wasn't informative. The best 1 example didn't feature in the best 2 examples. So I couldn't optimise for 5, 6,7 examples..