logoalt Hacker News

Twirrimtoday at 3:17 AM3 repliesview on HN

I've been finding it very practical to run the 35B-A3B model on an 8GB RTX 3050, it's pretty responsive and doing a good job of the coding tasks I've thrown at it. I need to grab the freshly updated models, the older one seems to occasionally get stuck in a loop with tool use, which they suggest they've fixed.


Replies

fy20today at 6:08 AM

I guess you are doing offloading to system RAM? What tokens per second do you get? I've got an old gaming laptop with a RTX 3060, sounds like it could work well as a local inference server.

show 1 reply
ufish235today at 3:56 AM

Can you give an example of some coding tasks? I had no idea local was that good.

show 1 reply
fragmedetoday at 3:57 AM

Which models would that be?