I have seen this part. In fact I checked the paper itself where they provide more detailed numbers: ...

killerstorm • yesterday at 2:09 AM • 0 replies • view on HN

I have seen this part. In fact I checked the paper itself where they provide more detailed numbers: it's still almost a double of the base Gemma, reuse of embeddings and attention doesn't make that much difference as most weights are in MLP s

alt Hacker News