I have written gemma3 inference in pure C

46 points • by robitec97 • last Monday at 2:05 PM • 17 comments • view on HN

Comments

My first implementation of gemma.cpp was kind of like this.

There's such a massive performance differential vs. SIMD though that I learned to appreciate SIMD (via highway) as one sweet spot of low-dependency portability that sits between C loops and the messy world of GPUs + their fat tree of dependencies.

If anyone want to learn the basics - whip out your favorite LLM pair programmer and ask it to help you study the kernels in the ops/ library of gemma.cpp:

https://github.com/google/gemma.cpp/tree/main/ops

➕ show 1 reply

w4yai • today at 7:35 PM

> It proves that modern LLMs can run without Python, PyTorch, or GPUs.

Did we need any proof of that ?

➕ show 4 replies

behnamoh • today at 8:14 PM

but why tho? next gemma is coming and no one uses gemma 3 in prod anyway.

➕ show 3 replies

alt Hacker News

I have written gemma3 inference in pure C

Comments