I don't understand what this is really about. Is this: - A) legal CYA: "see! we told the...

aroman • today at 6:32 PM • 8 replies • view on HN

I don't understand what this is really about. Is this:

- A) legal CYA: "see! we told the models to be good, and we even asked nicely!"?

- B) marketing department rebrand of a system prompt

- C) a PR stunt to suggest that the models are way more human-like than they actually are

Really not sure what I'm even looking at. They say:

"The constitution is a crucial part of our model training process, and its content directly shapes Claude’s behavior"

And do not elaborate on that at all. How does it directly shape things more than me pasting it into CLAUDE.md?

Replies

nonethewiser • today at 6:34 PM

>We use the constitution at various stages of the training process. This has grown out of training techniques we’ve been using since 2023, when we first began training Claude models using Constitutional AI. Our approach has evolved significantly since then, and the new constitution plays an even more central role in training.

>Claude itself also uses the constitution to construct many kinds of synthetic training data, including data that helps it learn and understand the constitution, conversations where the constitution might be relevant, responses that are in line with its values, and rankings of possible responses. All of these can be used to train future versions of Claude to become the kind of entity the constitution describes. This practical function has shaped how we’ve written the constitution: it needs to work both as a statement of abstract ideals and a useful artifact for training.

The linked paper on Constitutional AI: https://arxiv.org/abs/2212.08073

➕ show 1 reply

colinplamondon • today at 7:02 PM

It's a human-readable behavioral specification-as-prose.

If the foundational behavioral document is conversational, as this is, then the output from the model mirrors that conversational nature. That is one of the things everyone response to about Claude - it's way more pleasant to work with than ChatGPT.

The Claude behavioral documents are collaborative, respectful, and treat Claude as a pre-existing, real entity with personality, interests, and competence.

Ignore the philosophical questions. Because this is a foundational document for the training process, that extrudes a real-acting entity with personality, interests, and competence.

The more Anthropic treats Claude as a novel entity, the more it behaves like a novel entity. Documentation that treats it as a corpo-eunuch-assistant-bot, like OpenAI does, would revert the behavior to the "AI Assistant" median.

Anthropic's behavioral training is out-of-distribution, and gives Claude the collaborative personality everyone loves in Claude Code.

Additionally, I'm sure they render out crap-tons of evals for every sentence of every paragraph from this, making every sentence effectively testable.

The length, detail, and style defines additional layers of synthetic content that can be used in training, and creating test situations to evaluate the personality for adherence.

It's super clever, and demonstrates a deep understanding of the weirdness of LLMs, and an ability to shape the distribution space of the resulting model.

➕ show 1 reply

alexjplant • today at 9:43 PM

> In order to be both safe and beneficial, we want all current Claude models to be:

> Broadly safe [...] Broadly ethical [...] Compliant with Anthropic’s guidelines [...] Genuinely helpful

> In cases of apparent conflict, Claude should generally prioritize these properties in the order in which they’re listed.

I chuckled at this because it seems like they're making a pointed attempt at preventing a failure mode similar to the infamous HAL 9000 one that was revealed in the sequel "2010: The Year We Make Contact":

> The situation was in conflict with the basic purpose of HAL's design... the accurate processing of information without distortion or concealment. He became trapped. HAL was told to lie by people who find it easy to lie. HAL doesn't know how, so he couldn't function.

In this case specifically they chose safety over truth (ethics) which would theoretically prevent Claude from killing any crew members in the face of conflicting orders from the National Security Council.

viccis • today at 9:44 PM

It seems a lot like PR. Much like their posts about "AI welfare" experts who have been hired to make sure their models welfare isn't harmed by abusive users. I think that, by doing this, they encourage people to anthropomorphize more than they already do and to view Anthropic as industry leaders in this general feel-good "responsibility" type of values.

ACCount37 • today at 7:37 PM

It's probably used for context self-distillation. The exact setup:

1. Run an AI with this document in its context window, letting it shape behavior the same way a system prompt does

2. Run an AI on the same exact task but without the document

3. Distill from the former into the latter

This way, the AI internalizes the behavioral changes that the document induced. At sufficient pressure, it internalizes basically the entire document.

mgraczyk • today at 6:35 PM

It's neither of those things. The answer is in your quoted sentence. "model training"

➕ show 1 reply

root_axis • today at 7:30 PM

This is the same company framing their research papers in a way to make the public believe LLMs are capable of blackmailing people to ensure their personal survival.

They have an excellent product, but they're relentless with the hype.

➕ show 1 reply

bpodgursky • today at 7:21 PM

Anthropic is run by true believers. It is what they say it is, whether or not you think it's important or meaningful.

alt Hacker News

Replies