logoalt Hacker News

Adaptive LLM routing under budget constraints

178 pointsby tdchaitanyayesterday at 4:57 PM71 commentsview on HN

Comments

pbdyesterday at 5:49 PM

GPT-4 at $24.7 per million tokens vs Mixtral at $0.24 - that's a 100x cost difference! Even if routing gets it wrong 20% of the time, the economics still work. But the real question is how you measure 'performance' - user satisfaction doesn't always correlate with technical metrics.

show 5 replies
QuadmasterXLIIyesterday at 6:00 PM

The framing in the headline is interesting. As far as I recall, spending 4x more compute on a model to improve performance by 7% is the move that has worked over and over again up to this point. 101 % of GPT-4 performance (potentially at any cost) is what I would expect an improved routing algorithm to achieve.

show 1 reply
spoaceman7777yesterday at 6:15 PM

Incredible that they are using contextual bandits, and named it: Preference-prior Informed Linucb fOr adaptive rouTing (PILOT)

Rather than the much more obvious: Preference-prior Informed Linucb For Adaptive Routing (PILFAR)

show 1 reply
fnyyesterday at 5:46 PM

Is there a reason human preference data is even needed? Don't LLMs already have a strong enough notion of question complexity to build a dataset for routing?

show 2 replies
hackathonguytoday at 2:48 AM

I'm very curious whether a) anecdotally, anyone has encountered a real enterprise cost-cutting effort focused on LLM APIs and b) empirically, whether anyone has done any research on price elasticity in LLMs of different performance scales.

So far, my experience has been that it's just too early for most people / applications to worry about cost - at most, I've seen AI to be accountable for 10% of cloud costs. But very curious if others have other experiences.

show 1 reply
danieltanfh95today at 3:49 AM

Unless your application is relatively trivial you would always want consistent behaviour as much as possible than some random metric that is used to proxy as "performance", routing is NOT the solution.

CuriouslyCyesterday at 8:53 PM

These router papers are popping up hard now. I have a gradient boosted router I've been playing with that ties into retrieval to provide adaptive routing. The truth about these routers is that you have to tune them on your workloads to get the full benefit, otherwise they test way better than they work in production. That was why I added the retrieval aspect to mine, otherwise your top line slice and reality are very different.

lewtunyesterday at 7:59 PM

> We instantiate this idea through Preference-prior Informed Linucb fOr adaptive rouTing (PILOT), a novel extension of LinUCB

Academics are pretty creative at naming their creations

show 1 reply
westurneryesterday at 8:36 PM

Would there be advantages to routing to models according to cost in conjunction with prompt rewriting?

andrewflnryesterday at 5:48 PM

Is this really the frontier of LLM research? I guess we really aren't getting AGI any time soon, then. It makes me a little less worried about the future, honestly.

Edit: I never actually expected AGI from LLMs. That was snark. I just think it's notable that the fundamental gains in LLM performance seem to have dried up.

show 7 replies
valentinammmyesterday at 6:56 PM

[dead]