It's not a model of text, though. It's a model of multiple types of data. Pretty much all modern models are multimodal.