logoalt Hacker News

throwaway31415512/10/20241 replyview on HN

As far as I can tell, this is not Sora, but a distilled model that runs in a reasonable amount of time, with a reasonable amount of compute. It's pretty likely that's resulted in degradation of quality. Further, the marketing/demo-ing for Sora for the past year has been heavily curated videos from OpenAI and it's not clear what was generate using Sora and what was being generated using Sora "Turbo" (this distilled model). It wouldn't surprise me if some or much of it was from the original Sora model, leading to mismatched expectations and hype fatigue.

Mostly hunches from me. It could very well be that the original Sora is also plagued with outputs that aren't just subjectively "bad", but which aren't _useful_ (not adhering to the prompt, for instance).

There's some cool ideas here. The storyboard thing is nifty - kind of the refined synthetic captions that ChatGPT uses for DALLE3 on crack. Perhaps after people get over the prompting learning curve it will output better results. But it seems tougher to prompt than simple text-to-image, requiring generally longer prompts that aim to steer the model away from whatever strange thing it's doing that you don't need it to do. In my case, using the "image as the first frame" approach, the model generated cuts between newly imagined cameras consistently, when I simply wanted a single continuous shot from the POV of the camera of the photo.

We'll see, but I'm sort of over it. The UX is fancy for sure, and the scale they're pulling off with this is unprecedented even if there's already decent competitors.


Replies

gloosx12/10/2024

Read few other reviews as well, the general feeling seems to be more or less the same. People also complain that it often imagines faces on reference pictures and after some substantial delay denies generating, which is a big game-ender.

show 1 reply