logoalt Hacker News

purple-leafytoday at 4:59 AM2 repliesview on HN

Dumb question: can you train a model to predict the next byte of ANOTHER MODEL

So apply this same logic to compressing a bigger model within a smaller model

I know this is absolutely regarded, but humour me please


Replies

anygtoday at 5:06 AM

Not dumb at all. It's a whole field of active research - Speculative Decoding. A recent paper goes one level deeper with Speculative Speculative Decoding - https://arxiv.org/abs/2603.03251

show 1 reply
userbinatortoday at 5:53 AM

If there's any redundancy in the model that can be compressed (parallel to how RLE is used to compress the static Huffman tree in FLATE) that's possible, but it's not necessary if the model is being trained on the input dynamically, like what Bellard's NNCP does.