logoalt Hacker News

Run Llama locally with only PyTorch on CPU

162 pointsby anordin9510/08/202434 commentsview on HN

Comments

yjftsjthsd-h10/11/2024

If your goal is

> I want to peel back the layers of the onion and other gluey-mess to gain insight into these models.

Then this is great.

If your goal is

> Run and explore Llama models locally with minimal dependencies on CPU

then I recommend https://github.com/Mozilla-Ocho/llamafile which ships as a single file with no dependencies and runs on CPU with great performance. Like, such great performance that I've mostly given up on GPU for LLMs. It was a game changer.

show 7 replies
klaussilveira10/12/2024

Fast enough for RPI5 ARM?

Ship_Star_101010/11/2024

PyTorch has a native llm solution It supports all the LLama models. It supports CPU, MPS and CUDA https://github.com/pytorch/torchchat Getting 4.5 tokens a second using 3.1 8B full precision using CPU only on my M1

show 1 reply
I_am_tiberius10/11/2024

Does anyone know what's the easiest way to finetune a model locally is today?

show 1 reply
tcdent10/11/2024

> from llama_models.llama3.reference_impl.model import Transformer

This just imports the Llama reference implementation and patches the device FYI.

There are more robust implementations out there.

littlestymaar10/11/2024

With the same mindset, but without even PyTorch as dependency there's a straightforward CPU implementation of llama/gemma in Rust: https://github.com/samuel-vitorino/lm.rs/

It's impressive to realize how little code is needed to run these models at all.

anordin9510/08/2024

Peel back the layers of the onion and other gluey-mess to gain insight into these models.