logoalt Hacker News

sbszllryesterday at 8:32 PM1 replyview on HN

I agree that is performant enough for many applications, I work in the field. But it isn't performant enough to run large scale LLM inference with reasonable latency. Especially not when we compare the throughput numbers for a single-tenant inference inside a TEE vs batched non-private inference.


Replies

ramozyesterday at 8:39 PM

We just served Deepseek R1 on this bad boy in CC+TEE (and an integrated signing layer we developed for vLLM).

https://pasteboard.co/k1hjwT7pWI6x.png

reach out if interested in collab.