logoalt Hacker News

bobbiechenyesterday at 8:10 PM4 repliesview on HN

If I understand correctly, threat model here seems to be to protect against accidental issues that would impact performance, but doesn't cover malicious actor.

For example, Sketchy Provider tells you they are running the latest and greatest, but actually is knowingly running some cheaper (and worse) model and pocketing the difference. These tests wouldn't help since Sketchy Provider could detect when they're being tested and do the right thing (like the Volkswagen emissions scandal). Right?


Replies

nulltraceyesterday at 10:28 PM

Catching accidental drift is still worth a lot. It's basically the same idea as performance regression tests in CI, nobody writes those because they expect sabotage. It's for the boring stuff, like "oops, we bumped a dep and throughput dropped 15%".

If someone actually goes out of their way to bypass the check, that's a pretty different situation legally compared to just quietly shipping a cheaper quant anyway.

frogpersontoday at 12:18 AM

Providers like OpenRouter default to the cheapest provider. They are often cheap because they are rediculously quantized and tuned for throughput, not quality.

This is probably kimi trying to protect their brand from bargain basement providers that dont properly represent what the models are capable of.

show 1 reply
gpmyesterday at 9:12 PM

Yes and no.

For a truly malicious actor, you're right. But it shifts it from "well we aren't obviously committing fraud by quantizing this model and not telling people" to "we're deliberately committing fraud by verifying our deployment with one model and then serving customer requests with another".

I suspect there's a lot of semi-malicious actors who are only happy to do the former.

j-bosyesterday at 8:15 PM

Seems like a great challenge for all these systems, see fromtier labs serving quants when under hesvy load.