logoalt Hacker News

aabditoday at 5:06 PM0 repliesview on HN

This is mostly a decompression, it’s fairly standard nowadays. The point is to get the data from the internal compressed version into the human usable version.

We can technically reason at pixel or char level encodings but it’s going to be much more expensive generally. Think of the overall technique as a way to get computer go faster.

You see it with Qwen talker, most multimodal projectors, etc