logoalt Hacker News

weitendorfyesterday at 7:12 AM1 replyview on HN

> fine tune a general purpose LLMs (somehow, as it is much cheaper than starting from scratch, idk?) to behave this way rather than instructing it all the way in

I'd love to do that too but there are basically three ways to teach LLMs how to use it afaik: with data created "in the wild" and a degree of curation or augmentation, or with full-on reinforcement learning/goal-oriented training, or some kind of hybrid based on eg conformance testing and validating LLM output at a less sophisticated level (eg if it tries to call an api that's not in the set that it just saw during discovery, the LLM is being dumb, train it out of doing that).

The thing is they are not really mutually exclusive, and LLM companies will do it anyway to make their models useful if enough people are using this or want to use it. This is what's happened already with eg MCP and skills and many programming languages. Anyway, if prompting works to get it to use it properly it validates that the model can be trained to follow that process too, the same way it knows how to work with React


Replies

tokioyoyoyesterday at 7:32 AM

I see, makes sense! I’ll try to keep up to see what you guys are doing and overcome the problems. Thanks a lot!