logoalt Hacker News

dlenskitoday at 6:29 AM5 repliesview on HN

A nice illustration of the homogeneity of LLM responses. Another way to describe this effect would be…

If you ask humans to write 1,000 books, you're asking 1,000 different humans with different experiences and different skills and different moods (etc.) to write those books.

But if you ask LLMs to write 1,000 books, you're probably only talking to 3 or 5 different models, tops. And they've all trained on the same or similar data, and are trained to respond in very similar ways.

The LLMs don't differ much in anything like "life experience" or "skills", and they don't really have anything like a "mood" independent of the prompts you've given them.


Replies

NitpickLawyertoday at 8:36 AM

> A nice illustration of the homogeneity of LLM responses. [...] And they've all trained on the same or similar data, and are trained to respond in very similar ways.

I mostly agree, but this is a very simplified explanation. The models are indeed trained to respond in similar ways, for "basic" prompts. And that's as much a feature as it is a bug. In other words, the bug becomes apparent only if you give 100+ basic prompts. But giving it 100+ basic prompts and expecting originality is a silly endeavour. That's not how you get originality.

The way I'd go about to generate 1000 books, while expecting different outcomes is something along these lines (and nowadays you can ask your favorite LLM to wire up this workflow for you, with decent outcomes):

1. Ask for a list of 20 features that define a book (genre, style, number of characters, tropes, plot, continuity, relationships, etc.)

2. For each feature, ask for a list of 50 examples, ordered from most common to the most unique.

3. Randomly pick 10 features, and for each pick one of the 50 generated items. Ask for the rest of the features to match the theme.

4. Ask for 10 possible book outlines that match the chosen features, randomly pick between 2-8.

5. Create a detailed prompt that includes all the above features, and ask for a synopsis for each chapter, given the above outline chosen.

6. Given {features} and {outline} and {synopsis} write chapter 1.

7. for each chapter in list, given {...} and (optional) previous matching chapter(s), write chapter n+1

(optional 8.) given {...} and 2-3 consecutive chapters, align the ending / beginning of a new chapter for style / features / continuity, etc.

(optional 9.) given {...} and the whole book, list chapters / paragraphs that don't match the given {...} and provide a list of 5 improvements. (randomly choose 1 and ask for an edit).

----

Now, this probably won't give you something like cloud atlas, but they'll at least be different books. That's how I'd do it if I wanted to see how different they can write. Not 1000 "basic" prompts and expecting originality.

show 1 reply
smusamashahtoday at 7:41 AM

Reminds of Pluribus.

show 1 reply
ekianjotoday at 8:18 AM

prompts will give very different results. this is where you do the work.

show 4 replies
throw310822today at 6:51 AM

> you're asking 1,000 different humans with different experiences and different skills and different moods

Simply, if you ask an LLM, you're asking always to the same mind, and always for the first time.

show 1 reply
fragmedetoday at 7:18 AM

that discounts, how much the other context, ie, the system, prompt, and any sort of other context submitted to the model that can affect the output. If you ask a model as a patient for medical advice versus as a doctor, you will get different output from the same model.