Those are indeed 3 papers. | alt Hacker News

dartos • 12/09/2024 • 1 reply • view on HN

Those are indeed 3 papers.

Replies

GistNoesis • 12/09/2024

Yes in a nutshell they explain that you can express a picture or a video with relatively few discrete information.

First paper is the most famous and prompted a lot of research to using text generation tools in the image generation domain : 256 "words" for an image, Second paper is 24 reference image per minutes of video, Third paper is a refinement of the first saying you only need 32 "tokens". I'll let you multiply the numbers.

In kind of the same way as a who's who game, where you can identify any human on earth with ~32bits of information.

The corollary being that contrary to what parent is telling there is no theoretical obstacle to obtaining a video from a textual description.

➕ show 1 reply