logoalt Hacker News

erutoday at 8:37 AM2 repliesview on HN

No, LLMs only do this for language. They don't try to do this for arbitrary data.


Replies

energy123today at 9:57 AM

Transformers do this for any stream of tokens, those tokens can map to anything you want, and you will get lossy compression. Text produced by humans just happens to be dense, available, and a useful prior, but it is not intrinsically required. See 3D vision transformers for example.

woadwarrior01today at 8:47 AM

There are many approaches around this, the simplest being to treat bytes as tokens (cf: Google's ByT5[1]). Also, BLT[2] from Meta and ByteFormer[3] from Apple.

[1]: https://arxiv.org/abs/2105.13626

[2]: https://arxiv.org/abs/2412.09871

[3]: https://arxiv.org/abs/2306.00238