logoalt Hacker News

kqrtoday at 9:43 AM0 repliesview on HN

How much of this reply is environmentalism baked into it with post-training?

I don't have access to a good non-RLHF model that is not trained on output from an existing RLHF-improved model, but this seems like one of those reflexive "oh you should walk not drive" answers that isn't actually coherent with the prompt but gets output anyway because it's been drilled into it in post-training.