logoalt Hacker News

peterbell_nyctoday at 6:14 PM2 repliesview on HN

I auto tune my prompts to a locked model version based on production data used as evals with holdback data. I think the use case for this would be one off interactive prompts? For now I just run those all against an Opus 4.8 MAX and I'm sure I could downtune, although for interactive my opening prompt isn't always reflective of my overall goals for the multi turn session.

I'm just trying to figure out why on the fly routing would beat testing and tuning and locking models and versions for each class of call, with evals and auto tunes running to explore more possible models for commonly run classes of prompt over time . . .


Replies

gopher_spacetoday at 9:12 PM

"Based on your subscription tier and local hardware here's a list of models that fit and process definitions your biggest brain will comfortably handle."

I guess that sounds a lot like moving your evals and auto tunes to a third-party, but I don't have the time, budget, or inclination to create a system like this out of whole cloth and then keep it relevant.

I could see something that provides on-the-fly routing information being useful, but actual decision-making is too dependent on context.

adchurchtoday at 6:19 PM

[dead]