If your goal is > I want to peel back the layers of the onion and other gluey-mess to gain insi...

yjftsjthsd-h • 10/11/2024 • 8 replies • view on HN

If your goal is

> I want to peel back the layers of the onion and other gluey-mess to gain insight into these models.

Then this is great.

If your goal is

> Run and explore Llama models locally with minimal dependencies on CPU

then I recommend https://github.com/Mozilla-Ocho/llamafile which ships as a single file with no dependencies and runs on CPU with great performance. Like, such great performance that I've mostly given up on GPU for LLMs. It was a game changer.

Replies

hedgehog • 10/11/2024

Ollama (also wrapping llama.cpp) has GPU support, unless you're really in love with the idea of bundling weights into the inference executable probably a better choice for most people.

➕ show 2 replies

jart • 10/11/2024

A great place to start is with the LLaMA 3.2 q6 llamafile I posted a few days ago. https://huggingface.co/Mozilla/Llama-3.2-3B-Instruct-llamafi... We have a new CLI chatbot interface that's really fun to use. Syntax highlighting and all. You can also use GPU by passing the -ngl 999 flag.

➕ show 1 reply

seu • 10/12/2024

> then I recommend https://github.com/Mozilla-Ocho/llamafile which ships as a single file with no dependencies and runs on CPU with great performance. Like, such great performance that I've mostly given up on GPU for LLMs. It was a game changer.

First time that I have a "it just works" experience with LLMs on my computer. Amazing. Thanks for the recommendation!

rmbyrro • 10/11/2024

Do you have a ballpark idea of how much RAM would be necessary to run llama 3.1 8b and 70b on 8-quant?

➕ show 1 reply

anordin95 • 10/12/2024

Thanks for the suggestion. I've added a link to llamafile in the repo's README. Though, my focus was on exploring the model itself.

yumraj • 10/11/2024

Can it use GPU if available, say on Apple silicon Macs

➕ show 1 reply

bagels • 10/11/2024

How great is the performance? Tokens/s?

➕ show 1 reply

AlfredBarnes • 10/11/2024

Thanks for posting this!

➕ show 1 reply

alt Hacker News

Replies