Can't you just give it a photo of a dog, and then say "use this dog in this or that scene&...

amelius • 12/09/2024 • 2 replies • view on HN

Can't you just give it a photo of a dog, and then say "use this dog in this or that scene"?

Replies

artemisart • 12/09/2024

Yes, the idea works and was explored with dreambooth/textual inversion for image diffusion models.

https://dreambooth.github.io/ https://textual-inversion.github.io/

➕ show 1 reply

alpha_squared • 12/09/2024

How would that even work? A dog has physical features (legs, nose, eyes, ears, etc.) that they use to interact with the world around them (ground, tree, grass, sounds, etc.). And each one of those things has physical structures that compose senses (nervous system, optic nerves, etc.). There are layers upon layers of intricate complexity that took eons to develop and a single photo cannot encapsulate that level of complexity and density of information. Even a 3D scan can't capture that level of information. There is an implicit understanding of the physical world that helps us make sense of images. For example, a dog with all four paws standing on grass is within the bounds of possibility; a dog with six paws, two of which are on it's head, are outside the bounds of possibility. An image generator doesn't understand that obvious delineation and just approximates likelihood.

➕ show 1 reply

alt Hacker News

Replies