Don't have a GPU so tried the CPU option and got 0.6t/s on my old 2018 laptop using their ...

wild_egg • today at 1:17 AM • 3 replies • view on HN

Don't have a GPU so tried the CPU option and got 0.6t/s on my old 2018 laptop using their llama.cpp fork.

Then found out they didn't implement AVX2 for their Q1_0_g128 CPU kernel. Added that and getting ~12t/s which isn't shabby for this old machine.

Cool model.

Replies

UncleOxidant • today at 3:04 AM

Are you getting anything besides gibberish out of it? I tried their recommended commandline and it's dog slow even though I built their llama.cpp fork with AVX2 enabled. This is what I get:

    $ ./build/bin/llama-cli     -hf prism-ml/Bonsai-8B-gguf -p "Explain quantum computing in simple terms." -n 256 --temp 0.5 --top-p 0.85 --top-k 20 -ngl 99
    > Explain quantum computing in simple terms.

     \( ,

      None ( no for the. (,./. all.2... the                                                                                                                                ..... by/

EDIT: It runs fine in their collab notebook. Looking at that you have to do: git checkout prism (in the llama.cpp repo) before you build. That's a missing instruction if you're going straight to their fork of llama.cpp. Works fine now.

cubefox • today at 2:54 AM

"Not shabby" is a big understatement.

➕ show 1 reply

alt Hacker News

Replies