I was experimenting with how local, learnable routers can reduce token overhead, and lower costs, an...

viksit • last Saturday at 8:52 PM • 6 replies • view on HN

I was experimenting with how local, learnable routers can reduce token overhead, and lower costs, and decided to publish a post about it. The main goal is to delegate tool calls via a PyTorch based learner and examples of how to integrate this into a DSPy pipeline. Feedback welcome!

Replies

rybosome • yesterday at 1:20 AM

Thanks for the informative and inspiring post! This is definitely cool, and I can imagine very useful.

However I do want to mention that the “recommended” flow these days isn’t to separate out a tool request in the way you have. Eg instead of asking an LLM to route a tool, extracting that, running the tool, passing output back to the LLM, etc. - you simply pass the tool definitions, prompt, structural output expectations, and let the LLM (and your caller library) manage the tool use loop.

That’s how these modern LLMs are trained in post-training, and so I suspect it’s likely you’ll get different (and potentially worse?) results in trying to subvert this with a small, local model.

It comes with all the downsides you mentioned to let the LLM do this, but is also more likely to be in-distribution, and it’s easier to compose multiple tool calls.

Anyway, thanks for sharing! I’d love to see evals on a task where it compares the result when an LLM is involved in tool selection versus when it is handed tool output only - if I’m wrong about quality degradation then there’s a lot to like about your local tool routing.

➕ show 1 reply

krohling • last Saturday at 10:19 PM

I think this is a creative approach. I wonder how the success rates for that little RNN compare to the success rates of the primary LLM, especially for complex queries or complex tool calls. At some point you have to scale that network up large enough to get better results. Eventually you've come back around and you might as well use an LLM. I think a similar approach with potentially better results (depends on the application) could be accomplished by using that same dataset to finetune a small language model. It'd be interesting to see some success rate comparisons.

➕ show 1 reply

ctxc • last Saturday at 11:02 PM

Nit - code screenshots are a PITA to read on mobile!

➕ show 1 reply

zitterbewegung • last Saturday at 11:17 PM

Can you put all of the code into a gist or something?

➕ show 1 reply

bGl2YW5j • yesterday at 12:26 AM

Creative. You’ve given me some ideas. Thanks!

joe_the_user • last Saturday at 10:39 PM

My question is whether you have managed to make this work, perform a specific complex task, in some real world situation.

alt Hacker News

Replies