logoalt Hacker News

Imnimoyesterday at 8:12 PM3 repliesview on HN

>we did train Claude on it, including in SL.

How do you tell whether this is helpful? Like if you're just putting stuff in a system prompt, you can plausibly a/b test changes. But if you throwing it into pretraining, can Anthropic afford to re-run all of post-training on different versions to see if adding stuff like "Claude also has an incredible opportunity to do a lot of good in the world by helping people with a wide range of tasks." actually makes any difference? Is there a tractable way to do this that isn't just writing a big document of feel-good affirmations and hoping for the best?


Replies

ACCount37yesterday at 9:06 PM

You can A/B smaller changes on smaller scales.

Test run SFT for helpfulness, see if the soul being there makes a difference (what a delightful thing to say!). Get a full 1.5B model trained, see if there's a difference. If you see that it helps, worth throwing it in for a larger run.

I don't think they actually used this during pre-training, but I might be wrong. Maybe they tried to do "Opus 3 but this time on purpose", or mixed some SFT data into pre-training.

In part, I see this "soul" document as an attempt to address a well known, long-standing LLM issue: insufficient self-awareness. And I mean "self-awareness" in a very mechanical, no-nonsense way: having actionable information about itself and its own capabilities.

Pre-training doesn't teach an LLM that, and the system prompt only does so much. Trying to explicitly teach an LLM about what it is and what it's supposed to do covers some of that. Not all the self-awareness we want in an LLM, but some of it.

simonwyesterday at 8:15 PM

I would love to know the answer to that question!

One guess: maybe running multiple different fine-tuning style operations isn't actually that expensive - order of hundreds or thousands of dollars per run once you've trained the rest of the model.

I expect the majority of their evaluations are then automated, LLM-as-a-judge style. They presumably only manually test the best candidates from those automated runs.

show 2 replies
simianwordstoday at 4:50 AM

My prediction is that they have around 100 versions of the model. Some of them with different pretraining and some with different rl.