If we're benchmarking problems, mind trying out this problem on Pro if you're willing to s...

notemap • today at 1:15 AM • 0 replies • view on HN

If we're benchmarking problems, mind trying out this problem on Pro if you're willing to spare the compute?

https://www.acmicpc.net/problem/33797

I have the 20$ plan and I think I found a weird bug, at least with the thinking version. It gets stuck in the same local minima super quickly, even though the "fake solution" is easily disproved on random tests.

It's at the point where sometimes I've fed it the editorial and it still converges to the fake solution.

https://chatgpt.com/share/68c8b2ef-c68c-8004-8006-595501929f...

I'm sure that the model is capable of solving it, but seriously I've tried across multiple generations (since about when o3 came out) to get GPT to solve this problem and it's not hampered by its innate ability I don't think, it literally just refuses to think critically about the problem. Maybe with better prompting it doesn't get stuck as hard?

alt Hacker News