logoalt Hacker News

minimaxir12/09/20241 replyview on HN

Way back in the days of GPT-2, there was an expectation that you'd need to cherry-pick atleast 10% of your output to get something usable/coherent. GPT-3 and ChatGPT greatly reduced the need to cherry-pick, for better or for worse.

All the generated video startups seem to generate videos with much lower than 10% usable output, without significant human-guided edits. Given the massive amount of compute needed to generate a video relative to hyperoptimized LLMs, the quality issue will handicap gen video for the foreseeable future.


Replies

joe_the_user12/09/2024

Plus editing text or an image is practical. Video editors typically are used to cut and paste video streams - a video editor can't fix a stream of video that gets motion or anatomy wrong.