The license[0] seems quite restrictive, limiting it's use to non commercial research. It doesn't meet the open source definition so it's more appropriate to call it weights available.
[0]https://github.com/apple/ml-starflow/blob/main/LICENSE_MODEL
Looking at text to video examples (https://starflow-v.github.io/#text-to-video) I'm not impressed. Those gave me the feeling of the early Will Smith noodles videos.
Did I miss anything?
> STARFlow-V is trained on 96 H100 GPUs using approximately 20 million videos.
They don’t say for how long.
Looks good. I wonder what use case Apple has in mind though, or I suppose this is just what the researchers themselves were interested in, perhaps due to the current zeitgeist. I'm not really sure how it works at big tech companies with regards to research, are there top down mandates?
> Model Release Timeline: Pretrained checkpoints will be released soon. Please check back or watch this repository for updates.
> The checkpoint files are not included in this repository due to size constraints.
So it's not actually open weights yet. Maybe eventually once they actually release the weights it will be. "Soon"
Hopefully this will make into some useful feature in the ecosystem and not contribute to having just more terrible slop. Apple has saved itself from the destruction of quality and taste that these model enabled, I hope it stays that way.
"VAE: WAN2.2-VAE" so it's just a Wan2.2 edit, compressed to 7B.
<joke> GGUF when? </joke>
Apple has a video understanding model too. I can't wait to find out what accessibility stuff they'll do with the models. As a blind person, AI has changed my life.