the "Prompt Management" part of these products always seemed odd. Does anyone use it? Why?
I do understand why it’s a product - it feels a bit like what databricks has with model artifacts. Ie having a repo of prompts so you can track performance changes against is good. Especially if say you have users other than engineers touching them (ie product manager wants to AB).
Having said that, I struggled a lot with actually implementing langfuse due to numerous bugs/confusing AI driven documentation. So I’m amazed that it’s being bought to be really frank. I was just on the free version in order to look at it and make a broader recommendation, I wasn’t particularly impressed. Mileage may vary though, perhaps it’s a me issue.
Iterating on LLM agents involves testing on production(-like) data. The most accurate way to see whether your agent is performing well is to see it working on production.
You want to see the best results you can get from a prompt, so you use features like prompt management an A/B testing to see what version of your prompt performs better (i.e. is fit to the model you are using) on production.