Exactly. Even this paper shows how model creativity significantly drops and the models experience mode collapse like we saw in GANs, but the companies keep using RLHF...
https://arxiv.org/abs/2406.05587
A nice talk about a researcher's experience/benchmarks with raw GPT-4, before and after RLHF:
https://www.youtube.com/watch?v=qbIk7-JPB2c
A nice talk about a researcher's experience/benchmarks with raw GPT-4, before and after RLHF:
https://www.youtube.com/watch?v=qbIk7-JPB2c