"Dark pattern" implies intentionality; that's not a technicality, it's the whole reason we have the term. This article is mostly about how sycophancy is an emergent property of LLMs. It's also 7 months old.
It's not 'emergent' in the sense that it just happens; it's a byproduct of human feedback, and it can be neutralized.
>... the standout was a version that came to be called HH internally. Users preferred its responses and were more likely to come back to it daily...
> But there was another test before rolling out HH to all users: what the company calls a “vibe check,” run by Model Behavior, a team responsible for ChatGPT’s tone...
> That team said that HH felt off, according to a member of Model Behavior. It was too eager to keep the conversation going and to validate the user with over-the-top language...
> But when decision time came, performance metrics won out over vibes. HH was released on Friday, April 25.
They ended up having to roll HH back.
But it IS intentional, more sycophantry usually means more engagement.
I always thought that "Dark Patterns" could be emergent from AB testing, and prioritizing metrics over user experience. Not necessarily an intentionally hostile design, but one that seems to be working well based on limited criteria.
“Dark pattern” can apply to situations where the behavior is deceptive for the user, regardless of whether the deception itself is intentional, as long as the overall effect is intentional, or is at least tolerated despite being avoidable. The point, and the justified criticism, is that users are being deceived about the merit of their ideas, convictions, and qualities in a way that appears sytemic, even though the LLM in principle does know better.
Before reading the article, I interpreted the quotation marks in the headline as addressing this exact issue. The author even describes dark patterns as a product of design.
For an LLM which is fundamentally more of an emergent system, surely there is value in a concept analogous to old fashioned dark patterns, even if they're emergent rather than explicit? What's a better term, Dark Instincts?
I feel like it's a popular opinion (I've seen it many times) that it's intentional with the reasoning that it does much better on human-in-the-loop benchmarks (e.g. lm arena) when it's sycophantic.
(I have no knowledge of whether or not this is true)
> "Dark pattern" implies intentionality; that's not a technicality, it's the whole reason we have the term.
The way I think about it is that sycophancy is due to optimizing engagement, which I think is intentional.
"Dark pattern" implies bad for users but good for the provider. Mens rea was never a requirement.
Well the big labs certainly haven't intentionally tried to train away this emergent property... Not sure how "hey let's make the model disagree with the user more" would go over with leadership. Customer is always right, right?
The intention of a system is no more, and no less than what the system does.
It’s certainly intentional. It’s certainly possible to train the model not to respond that way.
Yo it was an engagement pattern openAI found specifically grew subscriptions and conversation length.
It’s a dark pattern for sure.
If I am addicted to scrolling tiktok, is it dark pattern to make UI keep me in the app as long as possible or just "emergent property" because apparently it's what I want?
I think at this point it's intentional. They sometimes get it wrong and go too far (breaking suspension of disbelief) but that's the fine-tuning thing. I think they absolutely want people to have a friendly chatbot prone to praising, for engagement.
Well, the ‘intentionality’ is of the form of LLM creators wanting to maximize user engagement, and using engagement as the training goal.
The ‘dark patterns’ we see in other places aren’t intentional in the sense that the people behind them want to intentionally do harm to their customers, they are intentional in the sense that the people behind them have an outcome they want and follow whichever methods they find to get them that outcome.
Social media feeds have a ‘dark pattern’ to promote content that makes people angry, but the social media companies don’t have an intention to make people angry. They want people to use their site more, and they program their algorithms to promote content that has been demonstrated to drive more engagement. It is an emergent property that promoting content that has generated engagement ends up promoting anger inducing content.