logoalt Hacker News

selcukayesterday at 11:54 PM1 replyview on HN

I use LMStudio for running models locally (macOS) and it tries to estimate whether the model would fit in my GPU memory (which is the same thing as main memory for Macs).

The Q4_K_S quantized version of Microsoft Fara 7B is a 5.8GB download. I'm pretty sure it would work on a 12GB Nvidia card. Even the Q8 one (9.5GB) could work.


Replies

BoredomIsFuntoday at 1:34 PM

12GiB card not GB. Extra tail compounds to extra 800 MB.