We decreased our LLM costs with Opus

85 points • by shad42 • today at 12:57 AM • 29 comments • view on HN

Comments

> We switched to the "triager" pattern: a Haiku agent with a very specific and narrow job. Is this issue already tracked or not? If it is, stop right there. If not, escalate to Opus.

> 4 out of 5 failures never reach Opus. A triager match costs around 25x less than a full investigation.

The title feels misleading. Why clickbait on that when you can just be genuine about the architecture?

➕ show 2 replies

iammrpayments • today at 7:17 AM

I’m afraid claude code will start doing this on the background without telling you

vanviegen • today at 5:33 AM

> It's the same reason you don't want to lead a debugging session by saying "I think the problem is in this file": you've biased the investigation before it started.

Unless you're evaluating the agent/person doing the debug session, why would you not provide them with some relevant insight about the problem you have? Given that you're pretty sure about it, of course.

cadamsdotcom • today at 2:05 AM

I have rewritten the article to be slightly shorter:

“Let a cheap agent decide if the expensive one is needed.”

➕ show 3 replies

neya • today at 2:53 AM

The whole clickbait article can be summarized in one line:

    Let a cheap agent decide if the expensive one is needed

albert_e • today at 3:51 AM

I want to create a "harness" that does this with Claude Code and other expensive agents.

Buffer user prompts, use conversation history and repo state as context -- and run a local model or a cheap and fast cloud model like Haiku to determine the optimal way to address the user's ask, reframe the query with better context (user reviews and approves if needed) and THEN let expensive models like Opus have a go at it.

If we are operating within Anthropic ecosystem with Haiku and Opus -- this sort of logic should ideally be doable within Claude Code as harness. Currently skills cannot be tagged to different models. Ideally we should be able to say -- for trivial tasks, the skill should always use Haiku even if invoked from a session with Opus xhigh.

➕ show 3 replies

syntaxing • today at 3:00 AM

Is RAG dead? I would be very surprised a local small SOTA embedded model like llama-embed-nemotron-8b doesnt outperform the Haiku layer for this application. Should be pretty cheap and easy to prove out. With 32K context size, you can literally one shot the whole ticket.

➕ show 2 replies

2001zhaozhao • today at 3:13 AM

> We switched to the "triager" pattern: a Haiku agent with a very specific and narrow job. Is this issue already tracked or not? If it is, stop right there. If not, escalate to Opus.

I'm planning to self host qwen3.6 27b basically for this purpose

➕ show 1 reply

whalesalad • today at 2:06 AM

Looking at the diagram, is this seriously a case of replacing basic functional concepts like "write to clickhouse" or "have we seen this before" to a model? could those be actual function calls in some language?

just seems wasteful all around. having an agent in the critical path when a regular expression (or similar) could do just seems odd. yeah haiku is cheap but re.match() is cheaper.

➕ show 1 reply

saltyoldman • today at 2:36 AM

I do a similar thing with a "planner agent" that uses the cheapest (I think it's using openai-gpt-5.2-mini or something at like 20 cents for 1M.) that more or less emits a plan name, task list and the task list has a recommended model in each task. It's not perfect, but many of our tasks are accomplished with lighter weight models. When doing code generation or fixing we upgrade to a more expensive model, planning and decisions are done more cheaply. Keep in mind the tasks are relatively constrained, so planning done with a cheap agent makes sense here. An open-ended agent would likely use a more expensive call for planning.

➕ show 1 reply

marlburrow • today at 4:04 AM

[dead]

EverMemory • today at 3:59 AM

[dead]

Rekindle8090 • today at 3:41 AM

[dead]

alt Hacker News

We decreased our LLM costs with Opus

Comments