logoalt Hacker News

telenardo12/10/20243 repliesview on HN

For those curious (and still locked out) here’s direct a comparison of Sora vs. the open-source leaders (HunyuanVideo, Mochi and LTX):

https://app.checkbin.dev/snapshots/1f0f3ce3-6a30-4c1a-870e-2...

Pros:

- Some of the Sora results are absolutely stunning. Check out the detail on the lion, for example! - The landscapes and aerial shots are absolutely incredible. - Quality is much better than Mochi & LTX out of the box. Mochi/LTX seem to require specifically optimized workflows (I've seen great img2vid LTX results on Reddit that start with Flux image generations, for example). Hunyuan seems comparable to Sora!

Cons:

- Still nearly impossible to access Sora despite the “launch”. My generations today were in the 2000s, implying that it’s only open to a very small number of people. There’s no api yet, so it’s not an option for developers. - Sora struggles with physical interactions. Watch the dancers moonwalk, or the ball goes through the dog. HunyuanVideo seems to be a bit better in this regard. - Can't run it locally mode (obviously) - I haven't tested this, but I think it's safe to assume Sora will be censored extensively. HunyuanVideo is surprisingly open (I've seen NSFW generations!) - I’m getting weird camera angles from Sora, but that could likely be solved with better prompting.

Overall, I’d say it’s the best model I've played with, though I haven’t spent much time on other non-open-source ones. Hunyuan gives it a run for its money, though!


Replies

spondyl12/10/2024

I can't speak to any of those videos in a technical sense but personally, I don't feel like any of them are good?

The vibe they give me is similar to the iPhone photography commercials where yes, in theory, a picnic in the park could look exactly like this except for all the parts that seem movie perfect.

I guess it's really more of a colour grading question where most of the Sora colour grading triggers that part of my brain that says "I'm watching a movie and this isn't real" without quite realising why.

A few of the Hunyuan videos in contrast seem a bit more believable even though they have some obvious glitches at times.

The other thing I think Sora has is that thing in commercials where no one else except the protagonist exists and nothing is ever inconvenient. The video of the teacher in a classroom with no students reminds me of that as well as the picnic in the park where there's wide open space with no one around.

I suppose it depends if the goal is to generate believable video and how you define believable.

zuminator12/10/2024

Hunyuan was more realistic but lower quality than Sora, shorter videos with lower resolution or bitrate. The downside to Sora's sharpness is that it makes mistakes more apparent. Also funny that Sora didn't understand the rolling dunes metaphor.

CSMastermind12/10/2024

Based on this it really seems like Hunyuan is a significantly better model. In nearly every example I preferred its output.