Adaptive LLM routing under budget constraints

178 points • by tdchaitanya • yesterday at 4:57 PM • 71 comments • view on HN

Comments

GPT-4 at $24.7 per million tokens vs Mixtral at $0.24 - that's a 100x cost difference! Even if routing gets it wrong 20% of the time, the economics still work. But the real question is how you measure 'performance' - user satisfaction doesn't always correlate with technical metrics.

➕ show 5 replies

QuadmasterXLII • yesterday at 6:00 PM

The framing in the headline is interesting. As far as I recall, spending 4x more compute on a model to improve performance by 7% is the move that has worked over and over again up to this point. 101 % of GPT-4 performance (potentially at any cost) is what I would expect an improved routing algorithm to achieve.

➕ show 1 reply

spoaceman7777 • yesterday at 6:15 PM

Incredible that they are using contextual bandits, and named it: Preference-prior Informed Linucb fOr adaptive rouTing (PILOT)

Rather than the much more obvious: Preference-prior Informed Linucb For Adaptive Routing (PILFAR)

➕ show 1 reply

fny • yesterday at 5:46 PM

Is there a reason human preference data is even needed? Don't LLMs already have a strong enough notion of question complexity to build a dataset for routing?

➕ show 2 replies

hackathonguy • today at 2:48 AM

I'm very curious whether a) anecdotally, anyone has encountered a real enterprise cost-cutting effort focused on LLM APIs and b) empirically, whether anyone has done any research on price elasticity in LLMs of different performance scales.

So far, my experience has been that it's just too early for most people / applications to worry about cost - at most, I've seen AI to be accountable for 10% of cloud costs. But very curious if others have other experiences.

➕ show 1 reply

danieltanfh95 • today at 3:49 AM

Unless your application is relatively trivial you would always want consistent behaviour as much as possible than some random metric that is used to proxy as "performance", routing is NOT the solution.

CuriouslyC • yesterday at 8:53 PM

These router papers are popping up hard now. I have a gradient boosted router I've been playing with that ties into retrieval to provide adaptive routing. The truth about these routers is that you have to tune them on your workloads to get the full benefit, otherwise they test way better than they work in production. That was why I added the retrieval aspect to mine, otherwise your top line slice and reality are very different.

lewtun • yesterday at 7:59 PM

> We instantiate this idea through Preference-prior Informed Linucb fOr adaptive rouTing (PILOT), a novel extension of LinUCB

Academics are pretty creative at naming their creations

➕ show 1 reply

axiom92 • yesterday at 9:51 PM

From last neurips https://automix-llm.github.io/automix/

westurner • yesterday at 8:36 PM

Would there be advantages to routing to models according to cost in conjunction with prompt rewriting?

andrewflnr • yesterday at 5:48 PM

Is this really the frontier of LLM research? I guess we really aren't getting AGI any time soon, then. It makes me a little less worried about the future, honestly.

Edit: I never actually expected AGI from LLMs. That was snark. I just think it's notable that the fundamental gains in LLM performance seem to have dried up.

➕ show 7 replies

valentinammm • yesterday at 6:56 PM

[dead]

alt Hacker News

Adaptive LLM routing under budget constraints

Comments