logoalt Hacker News

s-macke10/11/20241 replyview on HN

We don't know. The paper and the problem was very prominent at that time. Some developers at Anthropic or OpenAI might have included that in some way. Either as test or as a task to improve the CoT via Reinforcement Learning.


Replies

meroes10/12/2024

It made it into their data set via RLHF almost assuredly. Wild these papers are getting published when RLHF'ers and up see this stuff in the wild daily and ahead of the papers.

Timeline is roughly:

Model developer notices a sometimes highly specific weak area -> ... -> RLHF'ers are asked to develop a bunch of very specific problems improving the weak area -> a few months go by -> A paper gets published that squeezes water out of stone to make AI headlines.

These researchers should just become RLHF'ers because their efforts aren't uncovering anything unknown and it's just being dressed up with a little statistics. And by the time the research is out, the the fixes are already identified internally, worked on, and nearing pushes.

I just realized AI research will be part of the AI bubble if it bursts. I don't think there was a .com research sub-bubble, so this might be novel.