logoalt Hacker News

doctorpangloss11/08/20241 replyview on HN

Most people are using LoRAs as a solution for IP transfer.

Thing is Ideogram v2 has already achieved IP transfer without fine tuning or adapters. So we know those aren't needed.

Is Ideogram v2 an exotic architecture? No, I don't think so.

Are there exotic architectures that will solve IP transfer and other tasks? The Chameleon and OmniGen architectures. Lots of expertise went into SD3 and Flux dataset prep, but: the multimodal architectures are so much more flexible and expressive.

Flow matching models are maybe the last we will see before multi-modal goes big.

What to make of things in the community? How is it possible that random hyperparameters and 30 minute long fine tunings produce good results?

(1) Dreambooth effect: if it's like, a dog, you won't notice the flaws.

(2) Filing drawer problem. Nobody publishes the 99 things that didn't work.

(3) SD <3 struggled with IP transfer on image content that could not have possibly been in its datasets. But laypeople are not doing that. They don't have access to art content that Stability and BFL also don't have access to.

(4) Faces: of course SD family saw celebrity images. Faces are over-represented in its datasets. So yeah, it's going to be good at IP transfer of photographic faces. Most are in-sample.


Replies

jey11/09/2024

What's "IP transfer" in this context?