I wonder whether this could be used to fine-tune image models to provide better outputs. Something l...

elil17 • today at 7:24 AM • 2 replies • view on HN

I wonder whether this could be used to fine-tune image models to provide better outputs. Something like this:

1. Algorithmically generate a underdrawing (e.g. place numbers and shapes randomly in the underdrawing)

2. Algorithmically generate a description of the underdrawing (e.g. for each shape, output text like "there is a square with the number three in the top left corner). You might fuzz this by having an LLM rewrite the descriptions in a variety of ways.

3. Generate a "ground truth" image using the underdrawing and an image+text-to-image model.

4. Use the generated description and the generated "ground truth" image as training data for a text-to-image model.

Replies

hirako2000 • today at 7:45 AM

That would complexity the architecture of a model, to solve a finite set of cases. That's an argument for specialised/fine tuned models though.

slickytail • today at 8:27 AM

[dead]

alt Hacker News

Replies