LOL. Figuring out how to solve IMO-level math problems without "thinking" would be even more impressive than thinking itself. Now there's a parrot I'd buy.
It's like taking a student who wins a gold in IMO math, but can't solve easier math problems, because they did not study those type of problems. Where a human who is good at IMO math generalizes to all math problems.
It's just memorizing a trajectory as part of a specific goal. That's what RL is.
It isn't thinking it's RL with reward hacking.
It's like taking a student who wins a gold in IMO math, but can't solve easier math problems, because they did not study those type of problems. Where a human who is good at IMO math generalizes to all math problems.
It's just memorizing a trajectory as part of a specific goal. That's what RL is.