logoalt Hacker News

Apple Releases Open Weights Video Model

240 pointsby vessenestoday at 5:10 AM63 commentsview on HN

Comments

devinpratertoday at 8:34 AM

Apple has a video understanding model too. I can't wait to find out what accessibility stuff they'll do with the models. As a blind person, AI has changed my life.

show 5 replies
RobotToastertoday at 9:02 AM

The license[0] seems quite restrictive, limiting it's use to non commercial research. It doesn't meet the open source definition so it's more appropriate to call it weights available.

[0]https://github.com/apple/ml-starflow/blob/main/LICENSE_MODEL

yegletoday at 8:50 AM

Looking at text to video examples (https://starflow-v.github.io/#text-to-video) I'm not impressed. Those gave me the feeling of the early Will Smith noodles videos.

Did I miss anything?

show 3 replies
coolspottoday at 7:40 AM

> STARFlow-V is trained on 96 H100 GPUs using approximately 20 million videos.

They don’t say for how long.

satvikpendemtoday at 8:07 AM

Looks good. I wonder what use case Apple has in mind though, or I suppose this is just what the researchers themselves were interested in, perhaps due to the current zeitgeist. I'm not really sure how it works at big tech companies with regards to research, are there top down mandates?

LoganDarktoday at 12:35 PM

> Model Release Timeline: Pretrained checkpoints will be released soon. Please check back or watch this repository for updates.

> The checkpoint files are not included in this repository due to size constraints.

So it's not actually open weights yet. Maybe eventually once they actually release the weights it will be. "Soon"

nothrowawaystoday at 8:25 AM

Where do they get the video training data?

show 1 reply
camillomillertoday at 8:47 AM

Hopefully this will make into some useful feature in the ecosystem and not contribute to having just more terrible slop. Apple has saved itself from the destruction of quality and taste that these model enabled, I hope it stays that way.

ai_updatestoday at 8:58 AM

[flagged]

show 1 reply
mdrzntoday at 9:23 AM

"VAE: WAN2.2-VAE" so it's just a Wan2.2 edit, compressed to 7B.

show 3 replies
pulse7today at 10:16 AM

<joke> GGUF when? </joke>