How does this compare with Byte Latent Transformer [1]? This happens with convolution post-embedding...

bionhoward • 04/02/2025 • 1 reply • view on HN

How does this compare with Byte Latent Transformer [1]? This happens with convolution post-embedding while BLT happens with attention at embedding time?

1. https://ai.meta.com/research/publications/byte-latent-transf...

Replies

janalsncm • 04/02/2025

As I understand it, BLT uses a small nn to tokenize but doesn’t change the attention mechanism. MTA uses traditional BPE for tokenization but changes the attention mechanism. You could use both (latency be damned!)

alt Hacker News

Replies