logoalt Hacker News

CamperBob2last Monday at 4:42 AM1 replyview on HN

LOL. Figuring out how to solve IMO-level math problems without "thinking" would be even more impressive than thinking itself. Now there's a parrot I'd buy.


Replies

lazarus01last Monday at 1:31 PM

It isn't thinking it's RL with reward hacking.

It's like taking a student who wins a gold in IMO math, but can't solve easier math problems, because they did not study those type of problems. Where a human who is good at IMO math generalizes to all math problems.

It's just memorizing a trajectory as part of a specific goal. That's what RL is.

show 1 reply