Could they have added some swap?

batch12 • last Tuesday at 11:35 PM • 1 reply • view on HN

Replies

geerlingguy • last Tuesday at 11:38 PM

No, just updated the parent comment, I added -c 4096 to cut down the context size, and now the model loads.

I'm able to get 6-7 tokens/sec generation with 10-11 tokens/sec prompt processing with their model. Seems quite good, actually—much more useful than llama 3.2:3b, which has comparable performance on this Pi.

➕ show 1 reply

alt Hacker News

Replies