why not llama3.2:3B? it has fairly large context window too

garyfirestorm • 10/12/2024 • 1 reply • view on HN

reissbaker • 10/12/2024

I assume because the 8B model is smarter than the 3B model; it outperforms it on almost every benchmark: https://huggingface.co/meta-llama/Llama-3.2-3B

If you have the compute, might as well use the better model :)

The 3.2 series wasn't the kind of leap that 3.0 -> 3.1 was in terms of intelligence; it was just:

1. Meta releasing multimodal vision models for the first time (11B and 90B), and

2. Meta releasing much smaller models than the 3.1 series (1B and 3B).

alt Hacker News