This is big. The first really big open weights model that understands images.
How is this different from Llama 3.2 "vision capabilities"?
https://www.llama.com/docs/how-to-guides/vision-capabilities...
How is this different from Llama 3.2 "vision capabilities"?
https://www.llama.com/docs/how-to-guides/vision-capabilities...