How would you generate a picture of Noun + Noun in the first place in order to train the LLM with what it would look like? What's happening during that 1 estimated second?
This is why everyone trains their LLM on another LLM. It's all about the pelicans.
its pelicans all the way down