I am confused, how can functions that output images help with functions that should take images as input?
They’re multimodal LLMs trained for image generation. Turns out that if you want to generate images you gotta know what things look like.
They’re multimodal LLMs trained for image generation. Turns out that if you want to generate images you gotta know what things look like.