It has a SimpleQA score of 69%, a benchmark that tests knowledge on extremely niche facts, that'...

kingstnap • today at 5:10 PM • 4 replies • view on HN

It has a SimpleQA score of 69%, a benchmark that tests knowledge on extremely niche facts, that's actually ridiculously high (Gemini 2.5 *Pro* had 55%) and reflects either training on the test set or some sort of cracked way to pack a ton of parametric knowledge into a Flash Model.

I'm speculating but Google might have figured out some training magic trick to balance out the information storage in model capacity. That or this flash model has huge number of parameters or something.

Replies

scrollop • today at 8:00 PM

Also

https://artificialanalysis.ai/evaluations/omniscience

Prepare to be amazed

➕ show 1 reply

leumon • today at 7:07 PM

Or could it be that it's using tool calls in reasoning (e.g. a google search)?

tanh • today at 5:56 PM

This will be fantastic for voice. I presume Apple will use it

GaggiX • today at 5:29 PM

>or some sort of cracked way to pack a ton of parametric knowledge into a Flash Model.

More experts with a lower pertentage of active ones -> more sparsity.

alt Hacker News

Replies