They apparently managed gold in the IOI as well. A result that was extremely surprising for me and c...

birktj • yesterday at 6:42 PM • 2 replies • view on HN

They apparently managed gold in the IOI as well. A result that was extremely surprising for me and causes me to rethink a lot of assumptions I have about current LLMs. Unfortunately there was very little transparency on how they managed those results and the only source was a Twitter post. I want to know if there was any third party oversight, what kind of compute they used, how much power what kind of models and how they were set up? In this case I see that DeepMind at least has a blog post, but as far as I can see it does not answer any of my questions.

I think this is huge news, and I cannot imagine anything other than models with this capability having a massive impact all over the world. It causes me to be more worried than excited, it is very hard to tell what this will lead which is probably what makes it scary for me.

However with so little transparency from these companies and extreme financial pressure to perform well in these contests, I have to be quite sceptical of how truthful these results are. If true I think it is really remarkable, but I really want some more solid proof before I change my worldview.

Replies

XenophileJKO • yesterday at 7:13 PM

So outside of human intervention, I don't think the specifics really matter. What this means is that it is possible and that this capability will in time be commoditized.

This is helpful in framing the conversation, especially with "skeptics" of what these models are capable of.

➕ show 1 reply

conradkay • yesterday at 9:10 PM

I don't see that much reason to be skeptical since this basically lines up with the trend we've been seeing in their performance.

alt Hacker News

Replies