Fine tuning is a story that is nice to tell but that with modern LLMs makes less and less sense. Mod...

antirez • yesterday at 2:49 PM • 11 replies • view on HN

Fine tuning is a story that is nice to tell but that with modern LLMs makes less and less sense. Modern LLMs are so powerful that they are able to few shot learn complicated things, so a strong prompt and augmenting the generation (given the massive context window of Qwen3.5, too) is usually the best option available. There are models for which fine tuning is great, like image models: there with LoRa you can get good results in many ways. And LLMs of the past, too: it made sense for certain use cases. But now, why? LLMs are already released after seeing (after pre-training) massive amount of datasets for SFT and then RL. Removing the censorship is much more efficiently done with other techniques. So I have a strong feeling that fine tuning will be every day less relevant, and already is quite irrelevant. This, again, in the specific case of LLMs. For other foundational models fine tuning still makes sense and is useful (images, text to speech, ...).

Replies

prettyblocks • yesterday at 2:59 PM

I think the biggest case for fine tuning is probably that you can take small models, fine tune them for applications that require structured output, and then run cheap inference at scale. "Frontier LLMs can do it with enough context" is not really a strong argument against fine-tuning, because they're expensive to run.

➕ show 7 replies

bravura • yesterday at 4:36 PM

For me, trying to fine-tune a model to write "best day" prose I would accept over 80% of the time.

You are correct if we are talking about knowledge.

However it is bad at hyper-idiosyncratic, gritty style transfer.

I first noticed the issue when asking claude code to draft email responses. The choice of register was off. ("Register in writing refers to the level of formality and tone chosen to suit a specific audience, purpose, and context.")

I decided to talk all my HN comments and rewrite them in various bad LLM prose, and see if I could use DSPy to optimize a prompt using in-context-learning (ICL, I give it 10 examples of my HN comments) and the results were abysmal. RHLF fine-tuned frontier LLMs have a deep seated aversion to the target stylistic distribution of my comments.

I tried fine-tuning qwen3, llama, and gemma models. Instruct models are already so tuned that they could not be tuned. This is using several hunded comments as gold targets and 5 different LLM degradations per gold as the input.

➕ show 2 replies

danielhanchen • yesterday at 3:20 PM

These are fair points considering LLMs are getting smarter and better every week - but to be fair the biggest benefits of finetuning / RL are still not yet realized:

1. If we have robots at home, they need some sort of efficient continual learning, which could be on the go finetuning / RL via some small LoRA - this will need to do multimodal finetuning with sparse reward signals - one could also imagine all data is aggregated to one central processing center after anonymization, and training a larger model with more data + RL like that

2. Agreed images, audio, video etc is what still LoRA does well - the guide at https://unsloth.ai/docs/models/qwen3.5/fine-tune is actually a vision + text finetuning guide, so you can finetune the vision layers on your own use case

3. Model routing is going to be more the norm in the future - ie locally smallish models with LoRA for continuous finetuning can be used, but complex tasks can be offloaded to a large LLM in the cloud.

4. I also wrote about more use-cases below on the post - DoorDash, Vercel, Mercor, Stripe, NASA, Perplexity, Cursor and many others all do finetuning - for eg Cursor, Perplexity finetune large OSS LLMs themselves for their specific product lines - so there is definitely value if you have the data for it.

➕ show 1 reply

mountainriver • today at 12:52 AM

If that were true, we would be able to run working agents out of the box on any domain.

We are far from that still, for reliability in most applications you need fine tuning.

For any new modality you need fine tuning

For voice, image and video models you need fine tuning

For continual learning you (often) need fine tuning.

For any domain that is somewhat OOD you need fine tuning.

To fully ground a model you need fine tuning

abhgh • yesterday at 3:52 PM

They are great for specialized use-cases: (a) where the problem is not hard enough (you don't need reasoning), or (b) diverse enough (you don't need a world model), (c) you want cheap inference (and you can make it happen hardware-wise) and (d) you either have enough data or a workflow that accumulates data (with fine tuning with enough data you can sometimes beat a premier model while ensuring low latency - ofc, assuming (a) and (b) apply).

I make it sound like a rare perfect storm needs to exist to justify fine tuning, but these circumstances are not uncommon - to an extent (a), (c) and (d) were already prerequisites for deploying traditional ML systems.

joefourier • yesterday at 3:56 PM

Fine-tuning still makes sense for cost/latency-sensitive applications. Massive context windows drastically slow down generation, and modern models' performance and instruction following ability relies heavily on a reasoning step that can consume orders of magnitude more tokens than the actual response (depending on the application), while a fine-tuned model can skip/significantly reduce that step.

Using the large model to generate synthetic data offline with the techniques you mentioned, then fine-tuning the small model on it, is an underrated technique.

sweaterkokuro • yesterday at 4:07 PM

As strong as current LLMs are they are easily distracted from the task often. At production scale, fine tuning can make a lot more sense given you provide the model a very specific task.

andsoitis • yesterday at 4:21 PM

For agentic coding, which do you prefer:

a) qwen3-coder

b) qwen3.5 (general)

ranger_danger • yesterday at 2:59 PM

where it makes sense IMO is when you need it to know about a large amount of information that's not already in the model, such as a company knowledgebase, code repositories or a trove of specialized legal documents... in that case it's not realistic to try to stuff the context window every time with that information, especially if you're trying to make a responsive chat bot.

➕ show 3 replies

KronisLV • yesterday at 3:44 PM

> But now, why?

Because these models are good in general but their Latvian output is half-drivel, like the roots of the words are usually the right ones, but not the rest.

That, and EuroLLM is really slow to release new models that would be similarly good off the shelf.

esafak • yesterday at 3:15 PM

I would like model adaptation algorithms like Doc-to-LoRA (https://pub.sakana.ai/doc-to-lora/) to go mainstream.

alt Hacker News

Replies