Like with music generation models, the main thing that might make "open source" models better is most likely that they have no concern about excluding copyrighted material from the training data, so they actually get a good starting point instead of using a dataset consisting of youtube videos and stock footage