Last April I asked Claude Sonnet 3.7 to solve AoC 2024 day 3 in x86-64 assembler and it one-shotted solutions for part 1 and 2(!)
It's true this was 4 months after AoC 2024 was out, so it may have been trained on the answer, but I think that's way too soon.
Day 3 in 2024 isn't a Math Olympiad tier problem or anything but it seems novel enough, and my prior experience with LLMs were that they were absolutely atrocious at assembler.
Last year, I saw LLMs do well on the first week and accuracy drop off after that.
But as others have said, it’s a night and day difference now, particularly with code execution.