How an inference provider can prove they're not serving a quantized model

58 points • by FrasiertheLion • today at 6:53 AM • 39 comments • view on HN

Comments

Please serve well quantized models.

If you can get 99 percent of the quality for 50 percent of the cost, that is most times a good tradeoff.

Why not allow the user to provide the seed used for the generation. That way at least we can detect if the model has changed if the same prompt with the same seed suddenly gives a new answer (assuming they don't cache answers), you could compare different providers which supposedly use the same model, and if the model is open-weight you could even compare yourself on your own hardware or on rented gpus.

➕ show 2 replies

wongarsu • today at 8:40 PM

I'm somehow more convinced by the method shown in the introduction of the article: run a number of evals across model providers, see how they compare. This also catches all other configuration changes an inference provider can make, like KV-cache quantization. And it's easy to understand, talk about, and the threat model is fairly clear (be wary of fixed answers to your benchmark if you're really distrustful)

Of course conceptually attestation is neat and wastes less compute with repeated benchmarks. It definitely has its place

➕ show 1 reply

viraptor • today at 8:39 PM

The title here seems very different from the post. All that verification happens locally only. There's no remote validation at any point. So I'm not sure what's the reason to even apply this check. If you're running the model yourself, you know what you're downloading and can check the hash once for transfer problems. Then you can do different things for preventing storage bitrot. But you're not proving anything to your users this way.

You'd need to run a full, public system image with known attestation keys and return some kind of signed response with every request to do that. Which is not impossible, but the remote part seems to be completely missing from the description.

➕ show 1 reply

bthornbury • today at 9:23 PM

Is modelwrap running on arbitrary clients? I'm not following the whole post, but how are you able to maintain confidence in client-owned hardware/disks following the secure model the method seems to depdend on?

➕ show 1 reply

arcanemachiner • today at 8:22 PM

Call me an old fuddy-duddy, but my faith in the quality of your reporting really fell through the floor when I saw that the first image showed Spongebob Squarepants swearing at the worst-performing numbers.

EDUT: I read through the article, and it's a little over my head, but I'm intrigued. Does this actually work?

rhodey • today at 8:35 PM

In my opinion this is very well written

Two comments so far suggesting otherwise and I guess idk what their deal is

Attestation is taking off

➕ show 1 reply

LoganDark • today at 9:22 PM

I don't understand what stops an inference provider from giving you a hash of whatever they want. None of this proves that's what they're running, it only proves they know the correct answer. I can know the correct answer all I want, and then just do something different.

➕ show 2 replies

jMyles • today at 9:42 PM

Related but distinct: Is there an ELI5 about determinism in inference? In other words, when will the same prompt lead to the same output, and when not? And why not?

➕ show 2 replies

exceptione • today at 8:27 PM

The idea is that you run a workload at a model provider, that might cheat on you by altering the model they offer, right? So how does this help? If the provider wants to cheat (they apparently do), wouldn't they be able to swap the modelwrap container, or maybe even do some shenanigans with the filesystem?

I am ignorant about this ecosystem, so I might be missing something obvious.

➕ show 1 reply

45dsilicon • today at 7:56 PM

[dead]

cmrx64 • today at 9:45 PM

https://hellas.ai is building out their category theoretic compiler and protocol for solving this issue

alt Hacker News

How an inference provider can prove they're not serving a quantized model

Comments