logoalt Hacker News

amlutoyesterday at 8:52 PM12 repliesview on HN

I've contemplated this a bit, and I think I have a bit of an unconventional take:

First, this is really impressive.

Second, with that out of the way, these models are not playing the same game as the human contestants, in at least two major regards. First, and quite obviously, they have massive amounts of compute power, which is kind of like giving a human team a week instead of five hours. But the models that are competing have absolutely massive memorization capacity, whereas the teams are allowed to bring a 25-page PDF with them and they need to manually transcribe anything from that PDF that they actually want to use in a submission.

I think that, if you gave me the ability to search the pre-contest Internet and a week to prepare my submissions, I would be kind of embarrassed if I didn't get gold, and I'd find the contest to be rather less interesting than I would find the real thing.


Replies

asboansyesterday at 10:08 PM

Firstly, automobiles are really impressive.

Second, with that out the way, these cars are not playing the same game as horses… first, and quite obviously they have massive amounts of horsepower, which is kind of like giving a team of horses… many more horses. But also cars have an absolutely massive fuel capacity. Petrol is such an efficient store of chemical energy compared to hay and cars can store gallons of it.

I think if you give my horse the ability of 300 horses and fed it pure gasoline, I would be kind of embarrassed if it wasn’t able to win a horse race.

show 10 replies
paladin314159yesterday at 9:01 PM

> I think that, if you gave me the ability to search the pre-contest Internet and a week to prepare my submissions, I would be kind of embarrassed if I didn't get gold, and I'd find the contest to be rather less interesting than I would find the real thing.

I don't know what your personal experience with competitive programming is, so your statement may be true for yourself, but I can confidently state that this is not true for the VAST majority of programmers and software engineers.

Much like trying to do IMO problems without tons of training/practice, the mid-to-hard problems in the ICPC are completely unapproachable to the average computer science student (who already has a better chance than the average software engineer) in the course of a week.

In the same way that LLMs have memorized tons of stuff, the top competitors capable of achieving a gold medal at the ICPC know algorithms, data structures, and how to pattern match them to problems to an extreme degree.

show 2 replies
dddgghhbbfblktoday at 1:47 AM

I think that's because the framing around this (and similar stories about eg IMO performances) is imo slightly wrong. It's not interesting that they can get a gold medal in the sense of trying to rank them against human competitors. As you say, the direct comparisons are, while not entirely meaningless, at least very hard to interpret in the best of cases. It's very much an apples to oranges situation.

Rather, the impressive thing is simply that an AI is capable of solving these problems at all. These are novel (ie not in training set) problems that are really hard and beyond the ability of most professional programmers. The "gold medal" part is informative more in the sense that it gives an indication of how many problems the AI was able to solve & how well it was able to do them.

When talking with some friends about chatgpt just a couple years ago I remember being very confident that there was no way this technology would be able to solve this kind of novel, very challenging reasoning problem, and that there was no way it would be able to solve IMO problems. It's remarkable how quickly I've been proven wrong.

show 1 reply
tdb7893today at 1:23 AM

As someone who has been to the ICPC finals around a decade ago I agree that the limited time is really the big problem that these machine learning models don't really experience in the same way. Though that being said these problems are hard, the actual coding of the algorithms is pretty easy (most of the questions use one of a handful of algorithms that you've implemented a hundred times by the time you're in the finals) but recognizing which one will actually solve the problem correctly is not obvious at all. I know a lot of people that struggled in their undergrad algorithms class and I think a lot of those people given the ICPC finals problems would struggle even with being able to research.

modelessyesterday at 9:06 PM

It doesn't matter how many instances were running. All that matters is the wall clock time and the cost.

The fact that they don't disclose the cost is a clue that it's probably outrageous today. But costs are coming down fast. And hiring a team of these guys isn't exactly cheap either.

show 1 reply
OtherShrezzingyesterday at 11:53 PM

The human teams also get limited to one computer shared between 3 people. The models have access to an effectively unbounded number of computers.

My argument does feel a bit like the “Watson doesn’t need to physically push the button” equivalents from when that system beat Jeopardy for the first time. I assume 5 hours on a single high-end Mac would probably still be enough compute in the near future.

show 1 reply
theragrayesterday at 10:04 PM

I think your analogy is lacking. Human brain is much more efficient, so it is not right to say "giving a human team a week instead of five hours". Most likely, the whole OpenAI compute cannot match one brain in terms of connections and relations and computation power.

show 1 reply
roadside_picnictoday at 12:05 AM

> whereas the teams are allowed to bring a 25-page PDF

This is where I see the biggest issue. LLMs are first-and-foremost text compression algorithms. They have a compressed version of a very good chunk of human writing.

After being text compression engines, LLMs are really good at interpolating text based on the generalization induced by the lossy compression.

What this result really tells us is that, given a reasonably well compressed corpus of human knowledge, the ICPC can be view as an interpolation task.

show 2 replies
_diyaryesterday at 9:37 PM

I think your assessment is spot on. But I also think there's a bigger picture that's getting lost in the sauce, not just in your comment but in the general discourse around AI progress:

- We're currently unlocking capabilities to solve many tasks which could previously only be solved by the top-1% of the experts in the field.

- Almost all of that progress is coming from large scale deep learning. Turns out transformers with autoregression + RL are mighty generalists (tho yet far from AGI).

Once it becomes cheap enough so the average joe can tinker with models of this scale, every engineering field can apply it to their niche interest. And ultimately nobody cares if you're playing by the same rules as humans outside of these competitions, they only care that you make them wealthy, healthy and comfy.

dist-epochyesterday at 9:21 PM

If you want to play that game, let's compute how much energy was spent to grow, house and educate one team since they were born, over 20 years against how much was spent training the model.

show 1 reply
m3kw9today at 1:59 AM

the end game is that running similar tasks at any moment time and place.

th0ma5yesterday at 11:30 PM

Yes yes given this why didn't it do better and isn't it embarrassing to have done it through statistical brute force and not intelligence.