> including all previous experiments
How far back do you go? What about experiments into architecture features that didn’t make the cut? What about pre-transformer attention?
But more generally, why are you so sure that they team that built Gemini didn’t exclusively use TPUs while they were developing it?
I think that one of the reasons that Gemini caught up so quickly is because they have so much compute at fraction of the price of everyone else.