Frontier models went from not being able to count the number of 'r's in "strawberry&q...

charleshn • last Saturday at 10:23 AM • 2 replies • view on HN

Frontier models went from not being able to count the number of 'r's in "strawberry" to getting gold at IMO in under 2 years [0], and people keep repeating the same clichés such as "LLMs can't reason" or "they're just next token predictors".

At this point, I think it can only be explained by ignorance, bad faith, or fear of becoming irrelevant.

[0] https://x.com/alexwei_/status/1946477742855532918

Replies

bwfan123 • last Saturday at 3:22 PM

> At this point, I think it can only be explained by ignorance, bad faith, or fear of becoming irrelevant.

Based on the past history with frontier-math & AIME 2025 [1],[2] I would not trust announcements which cant be independently verified. I am excited to try it out though.

Also, the performance of LLMs was not even bronze [3].

Finally, this article shows that LLMs were just mostly bluffing [4].

[1] https://www.reddit.com/r/slatestarcodex/comments/1i53ih7/fro...

[2] https://x.com/DimitrisPapail/status/1888325914603516214

[3] https://matharena.ai/imo/

[4] https://arxiv.org/pdf/2503.21934

yeasku • yesterday at 1:41 AM

Open AI is 10 years old and and llm just told me a dolar is 1.03 euros.

➕ show 1 reply

alt Hacker News

Replies