logoalt Hacker News

kingstnaptoday at 5:10 PM4 repliesview on HN

It has a SimpleQA score of 69%, a benchmark that tests knowledge on extremely niche facts, that's actually ridiculously high (Gemini 2.5 *Pro* had 55%) and reflects either training on the test set or some sort of cracked way to pack a ton of parametric knowledge into a Flash Model.

I'm speculating but Google might have figured out some training magic trick to balance out the information storage in model capacity. That or this flash model has huge number of parameters or something.


Replies

scrolloptoday at 8:00 PM

Also

https://artificialanalysis.ai/evaluations/omniscience

Prepare to be amazed

show 1 reply
leumontoday at 7:07 PM

Or could it be that it's using tool calls in reasoning (e.g. a google search)?

tanhtoday at 5:56 PM

This will be fantastic for voice. I presume Apple will use it

GaggiXtoday at 5:29 PM

>or some sort of cracked way to pack a ton of parametric knowledge into a Flash Model.

More experts with a lower pertentage of active ones -> more sparsity.