logoalt Hacker News

Lm.rs: Minimal CPU LLM inference in Rust with no dependency

310 pointsby littlestymaar10/11/202476 commentsview on HN

Comments

simonw10/11/2024

This is impressive. I just ran the 1.2G llama3.2-1b-it-q80.lmrs on a M2 64GB MacBook and it felt speedy and used 1000% of CPU across 13 threads (according to Activity Monitor).

    cd /tmp
    git clone https://github.com/samuel-vitorino/lm.rs
    cd lm.rs
    RUSTFLAGS="-C target-cpu=native" cargo build --release --bin chat
    curl -LO 'https://huggingface.co/samuel-vitorino/Llama-3.2-1B-Instruct-Q8_0-LMRS/resolve/main/tokenizer.bin?download=true'
    curl -LO 'https://huggingface.co/samuel-vitorino/Llama-3.2-1B-Instruct-Q8_0-LMRS/resolve/main/llama3.2-1b-it-q80.lmrs?download=true'
    ./target/release/chat --model llama3.2-1b-it-q80.lmrs
show 3 replies
jll2910/11/2024

This is beautifully written, thanks for sharing.

I could see myself using some of the source code in the classroom to explain how transformers "really" work; code is more concrete/detailed than all those pictures of attention heads etc.

Two points of minor criticism/suggestions for improvement:

- libraries should not print to stdout, as that output may detroy application output (imagine I want to use the library in a text editor to offer style checking). So best to write to a string buffer owned by a logging class instance associated with a lm.rs object.

- Is it possible to do all this without "unsafe" without twisting one's arm? I see there are uses of "unsafe" e.g. to force data alignment in the model reader.

Again, thanks and very impressive!

show 1 reply
J_Shelby_J10/11/2024

Neat.

FYI I have a whole bunch of rust tools[0] for loading models and other LLM tasks. For example auto selecting the largest quant based on memory available, extracting a tokenizer from a gguf, prompting, etc. You could use this to remove some of the python dependencies you have.

Currently to support llama.cpp, but this is pretty neat too. Any plans to support grammars?

[0] https://github.com/ShelbyJenkins/llm_client

wyldfire10/11/2024

The title is less clear than it could be IMO.

When I saw "no dependency" I thought maybe it could be no_std (llama.c is relatively lightweight in this regard). But it's definitely not `no_std` and in fact seems like it has several dependencies. Perhaps all of them are rust dependencies?

show 4 replies
gip10/11/2024

Great! Did something similar some time ago [0] but the performance was underwhelming compared to C/C++ code running on CPU (which points to my lack of understanding of how to make Rust fast). Would be nice to have some benchmarks of the different Rust implementations.

Implementing LLM inference should/could really become the new "hello world!" for serious programmers out there :)

[0] https://github.com/gip/yllama.rs

show 1 reply
echelon10/11/2024

This is really cool.

It's already using Dioxus (neat). I wonder if WASM could be put on the roadmap.

If this could run a lightweight LLM like RWKV in the browser, then the browser unlocks a whole class of new capabilities without calling any SaaS APIs.

show 2 replies
lucgagan10/11/2024

Correct me if I am wrong, but these implementations are all CPU bound?, i.e. if I have a good GPU, I should look for alternatives.

show 6 replies
dcreater10/12/2024

What's the value of this compared to llama.cpp?

show 2 replies
dvt10/12/2024

This is cool (and congrats on writing your first Rust lib!), but Metal/Cuda support is a must for serious local usage.

show 1 reply
aravindputrevu10/12/2024

Interesting, I appreciate the rust community‘s enthu to rewrite most the stuff.

fuddle10/11/2024

Nice work, it would be great to see some benchmarks comparing it to llm.c.

show 1 reply
nikolayasdf12310/12/2024

how does this compare to https://github.com/EricLBuehler/mistral.rs ?

show 1 reply
kvakkefly10/12/2024

Would love to see a wasm version of this!

show 1 reply
marques57610/11/2024

Such a talented guy!

vietvu10/12/2024

Another llama.cpp and mistral.rs? If it support vision models then fine, I will try it.

EDIT: Looks like no L3.2 11B yet.

show 1 reply