>publish these incredible papers explaining how they achieved their gains - something the America...

sigmar • yesterday at 3:34 PM • 4 replies • view on HN

>publish these incredible papers explaining how they achieved their gains - something the American labs no longer do unfortunately.

Google is still releasing a lot of llm architecture research. They introduced speculative decoding of LLMs in 2022[1], then released the code to perform sceculative decoding for their Gemma 4 model this year[2]

[1] https://arxiv.org/abs/2211.17192

[2] https://github.com/google-gemma/cookbook/blob/main/docs/mtp/...

Replies

kamranjon • yesterday at 3:49 PM

Thanks for the clarification - Google does publish more than others - and I actually really appreciate the work they are doing with the Gemma models, which are truly competitive open models. I do wish they’d publish more in depth papers on their Gemma models but appreciate that they are open weights.

DiabloD3 • yesterday at 4:30 PM

They weren't the first to do MTP like this, and arguably did it wrong: the MTP heads are kept in a separate file and have to be welded in by the inference engine.

Qwen 3.6 shipped with working MTP first, and had working MTP in llama.cpp first.

➕ show 2 replies

janalsncm • today at 2:15 AM

They also shipped Gemma models with their new Matformer architecture which allows for dynamic computation.

https://arxiv.org/pdf/2310.07707v2

sieabahlpark • yesterday at 10:31 PM

[dead]

alt Hacker News

Replies