With the same mindset, but without even PyTorch as dependency there's a straightforward CPU implementation of llama/gemma in Rust: https://github.com/samuel-vitorino/lm.rs/
It's impressive to realize how little code is needed to run these models at all.