I spent a lot of time last summer trying to get prompts to optimise using various techniques and I found that the search space was just too big to make real progress. Sure - I found a few little improvements in various iterations, but actual optimisation, not so much.
So I am pretty skeptical of using such unsophisticated methods to create or improve such sophisticated artifacts.
This is exactly what I'm doing. Some papers I'm studying:
TextGrad: Automatic "Differentiation" via Text: https://arxiv.org/abs/2406.07496
LLM-AutoDiff: Auto-Differentiate Any LLM Workflow : https://arxiv.org/abs/2501.16673
Trace is the Next AutoDiff: Generative Optimization with Rich Feedback, Execution Traces, and LLMs: https://arxiv.org/abs/2406.16218
GReaTer: Gradients over Reasoning Makes Smaller Language Models Strong Prompt Optimizers: https://arxiv.org/abs/2412.09722
PromptWizard: Task-Aware Prompt Optimization Framework: https://arxiv.org/abs/2405.18369