logoalt Hacker News

Zebra-Llama: Towards Efficient Hybrid Models

59 pointsby mirrirtoday at 8:15 PM29 commentsview on HN

Comments

AlexCoventrytoday at 11:39 PM

This is from May 2025, according to the arxiv watermark. Maybe that should be mentioned in the title.

adityashankartoday at 9:45 PM

Due to perverse incentives and the historical nature of models over-claiming accuracy, it's very hard to believe anything until it is open source and can be tested out

that being said, I do very much believe that computational efficiency of models is going to go up [correction] drastically over the coming months, which does pose interesting questions over nvidia's throne

*previously miswrote and said computational efficiency will go down

show 3 replies
a_wild_dandantoday at 10:45 PM

If the claims in the abstract are true, then this is legitimately revolutionary. I don’t believe it. There are probably some major constraints/caveats that keep these results from generalizing. I’ll read through the paper carefully this time instead of a skim and come back with thoughts after I’ve digested it.

show 1 reply
xertoday at 10:47 PM

This is great! But what if the US invests 1% of GDP in GPU datacenters and then those are not needed becaues someone created a much more efficient architecture?

show 4 replies
Reubendtoday at 10:35 PM

It would be REALLY cool to see this same technique applied to a much more recent OSS model distillation. For example, Mistral 3 14B would be a great target. How efficient can we get inference there?

mason_mplstoday at 9:44 PM

> Zebra-Llama achieves Transformer-level accuracy with near-SSM efficiency using only 7–11B training tokens (compared to trillions of tokens required for pre-training) and an 8B teacher. Moreover, Zebra-Llama dramatically reduces KV cache size—down to 3.9%, 2%, and 2.73% of the original for the 1B, 3B, and 8B variants, respectively—while preserving 100%, 100%, and 97% of average zero-shot performance on LM Harness tasks.

This is an extraordinary claim, is there a catch I’m missing? Am I misreading?

show 1 reply