logoalt Hacker News

batch12last Tuesday at 11:35 PM1 replyview on HN

Could they have added some swap?


Replies

geerlingguylast Tuesday at 11:38 PM

No, just updated the parent comment, I added -c 4096 to cut down the context size, and now the model loads.

I'm able to get 6-7 tokens/sec generation with 10-11 tokens/sec prompt processing with their model. Seems quite good, actually—much more useful than llama 3.2:3b, which has comparable performance on this Pi.

show 1 reply