why not llama3.2:3B? it has fairly large context window too
I assume because the 8B model is smarter than the 3B model; it outperforms it on almost every benchmark: https://huggingface.co/meta-llama/Llama-3.2-3B
If you have the compute, might as well use the better model :)
The 3.2 series wasn't the kind of leap that 3.0 -> 3.1 was in terms of intelligence; it was just:
1. Meta releasing multimodal vision models for the first time (11B and 90B), and
2. Meta releasing much smaller models than the 3.1 series (1B and 3B).
I assume because the 8B model is smarter than the 3B model; it outperforms it on almost every benchmark: https://huggingface.co/meta-llama/Llama-3.2-3B
If you have the compute, might as well use the better model :)
The 3.2 series wasn't the kind of leap that 3.0 -> 3.1 was in terms of intelligence; it was just:
1. Meta releasing multimodal vision models for the first time (11B and 90B), and
2. Meta releasing much smaller models than the 3.1 series (1B and 3B).