I already feel like text models are already at sufficiently entertaining and useful quality as you define it. It's definitely possible we never get there for video or 3D modalities, but I think there are strong enough economic incentives such that big tech will dump tens of billions of dollars into achieving it.
I don't know why you think that's the case regarding text models. If that was the case, there would be articles on here that are just created by only generative AI and nobody would know the difference. It's pretty obvious that's not happening yet, not the least of which because I know what kinds of slop state-of-the-art generative models still produce when you give them open-ended prompts.