You can fine-tune a model, but there are also smaller models fine-tuned for specific work like structured output and tool calling. You can build automated workflows that are largely deterministic and only slot in these models where you specifically need an LLM to do a bit of inference. If frontier models are a sledgehammer, this approach is the scalpel.
A common example would be that people are moving tasks from their OpenClaw setup off of expensive Anthropic APIs onto cheaper models for simple tasks like tagging emails, summarizing articles, etc.
Combined with memory systems, internal APIs, or just good documentation, a lot of tasks don't actually require much compute.
You can fine-tune a model, but there are also smaller models fine-tuned for specific work like structured output and tool calling. You can build automated workflows that are largely deterministic and only slot in these models where you specifically need an LLM to do a bit of inference. If frontier models are a sledgehammer, this approach is the scalpel.
A common example would be that people are moving tasks from their OpenClaw setup off of expensive Anthropic APIs onto cheaper models for simple tasks like tagging emails, summarizing articles, etc.
Combined with memory systems, internal APIs, or just good documentation, a lot of tasks don't actually require much compute.