I've been finding it very practical to run the 35B-A3B model on an 8GB RTX 3050, it's pret...

Twirrim • today at 3:17 AM • 3 replies • view on HN

I've been finding it very practical to run the 35B-A3B model on an 8GB RTX 3050, it's pretty responsive and doing a good job of the coding tasks I've thrown at it. I need to grab the freshly updated models, the older one seems to occasionally get stuck in a loop with tool use, which they suggest they've fixed.

Replies

fy20 • today at 6:08 AM

I guess you are doing offloading to system RAM? What tokens per second do you get? I've got an old gaming laptop with a RTX 3060, sounds like it could work well as a local inference server.

➕ show 1 reply

ufish235 • today at 3:56 AM

Can you give an example of some coding tasks? I had no idea local was that good.

➕ show 1 reply

fragmede • today at 3:57 AM

Which models would that be?

alt Hacker News

Replies