I think you need to go back and rewatch Will Smith eating spaghetti. These examples are far from perfect and probably not the best model right now, but they're far better than you're giving credit for.
As far as I know, this might be the most advanced text-to-video model that has been released? I'm not sure whether the license will qualify as open enough in everyone's eyes, though.