I guess you are doing offloading to system RAM? What tokens per second do you get? I've got an ...

fy20 • yesterday at 6:08 AM • 1 reply • view on HN

I guess you are doing offloading to system RAM? What tokens per second do you get? I've got an old gaming laptop with a RTX 3060, sounds like it could work well as a local inference server.

Replies

manmal • yesterday at 7:30 AM

In the article, they claim up to 25t/s for the LARGEST model with a 24GB VRAM card. Need a lot of RAM obviously

alt Hacker News

Replies