logoalt Hacker News

garyfirestorm10/12/20241 replyview on HN

why not llama3.2:3B? it has fairly large context window too


Replies

reissbaker10/12/2024

I assume because the 8B model is smarter than the 3B model; it outperforms it on almost every benchmark: https://huggingface.co/meta-llama/Llama-3.2-3B

If you have the compute, might as well use the better model :)

The 3.2 series wasn't the kind of leap that 3.0 -> 3.1 was in terms of intelligence; it was just:

1. Meta releasing multimodal vision models for the first time (11B and 90B), and

2. Meta releasing much smaller models than the 3.1 series (1B and 3B).