logoalt Hacker News

gwdtoday at 12:19 AM1 replyview on HN

OK, so a while back I set up a workflow to do language tagging. There were 6-8 stages in the pipeline where it would go out to an LLM and come back. Each one has its own prompt that has to be tweaked to get it to give decent results. I was only doing it for a smallish batch (150 short conversations) and only for private use; but I definitely wouldn't switch models without doing another informal round of quality assessment and prompt tweaking. If this were something I was using in production there would be a whole different level of testing and quality required before switching to a different model.


Replies

0xbadcafebeetoday at 1:00 AM

The big providers are gonna deprecate old models after a new one comes out. They can't make money off giant models sitting on GPUs that aren't taking constant batch jobs. If you wanna avoid re-tweaking, open weights are the way. Lots of companies host open weights, and they're dirt cheap. Tune your prompts on those, and if one provider stops supporting it, another will, or worst case you could run it yourself. Open weights are now consistently at SOTA-level at only a month or two behind the big providers. But if they're short, simple prompts, even older, smaller models work fine.