What do those compress to with conventional approaches? For comparison. I am curious. A classic ma...

SubiculumCode • today at 3:44 AM • 1 reply • view on HN

What do those compress to with conventional approaches? For comparison.

I am curious. A classic machine learning ensemble approach is to overfit a collection of small models then bag them (e.g. voting) allowing the models to generalize.

I'm sure someone's tried to overfit a bunch of transformers for compression like this, then bag them to see how well it does?

Replies

gwern • today at 4:47 AM

Ensembling is not compute or parameter-efficient, so compression per se is a terrible application. (This is related to why people train ever larger LLMs like 1 10t-parameter LLM, rather than 100 GPT-3-scale LLMs.)

➕ show 1 reply

alt Hacker News

Replies