>The estimated training time for the end-to-end model on an 8×H100 machine is 2.6 days.
That's a $250,000 machine for the micro budget. Or if you don't want to do it locally ~$2,000 to do it on someone else's machine for the one model.
This is the first time I came across micro-budget term in AI context.
> end-to-end model on an 8×H100 machine is 2.6 days based on the pricing on Lambda labs site, it's about $215 which isn't bad for training a model for educational purposes.
I love models on a budget. These are the ones that really make us think what we're doing and bring out new ideas.
The pixel art these models produce continues to look like shit and not be actual pixel art.
Where'd you get your dataset? Did you get permission from the rightsholders to use their work for this?
The differently styled images of "astronaut riding a horse" are great, but that has been a go-to example for image generation models for a while now. The introduction says that they train on 37 million real and synthetic images. Are astronauts riding horses now represented in the training data more than would have been possible 5 years ago?
If it's possible to get good, generalizable results from such (relatively) small data sets, I'd like to see what this approach can do if trained exclusively on non-synthetic permissively licensed inputs. It might be possible to make a good "free of any possible future legal challenges" image generator just from public domain content.