antirez: i'm curious, with the final code, have you experimented with effectively one-shotting the final result? i wonder if we can get there with GEPA, and maybe there's something we can learn in how to elicit/prompt these models to get what we want.
or maybe the conclusion is that model providers need to clean up their training data!