If the claims on multilingual and pretraining performance are accurate, this is huge! This may be the best-in-class multilingual stuff since the more recent Gemma's, where they used to be unmatched. I know Americans don't care much about the rest of the world, but we're still using our native tongues thank you very much; there is a huge issue with i.e. Ukrainian (as opposed to Russian) being underrepresented in many open-weight and weight-available models. Gemma used to be a notable exception, I wonder if it's still the case. On a different note: I wonder why scores on TriviaQA vis-a-vis 14b model lags behind Gemma 12b so much; that one is not a formatting-heavy benchmark.
> I wonder why scores on TriviaQA vis-a-vis 14b model lags behind Gemma 12b so much; that one is not a formatting-heavy benchmark.
My guess is the vast scale of google data. They've been hoovering data for decades now, and have had curation pipelines (guided by real human interactions) since forever.