Well diffusers are trained unsupervised on raw pictures. I don't know how they train multi-moda...

teaearlgraycold • today at 2:40 AM • 1 reply • view on HN

Well diffusers are trained unsupervised on raw pictures. I don't know how they train multi-modal LLMs on images, but yes obviously they are consuming other media than just text. I don't think, but would be happy to be corrected, that models glean much of their "knowledge" from non-textual training data.

Replies

mikert89 • today at 4:03 AM

you couldnt be more wrong

➕ show 1 reply

alt Hacker News

Replies