context is always an issue with local models and consumer hardware.

nickthegreek • yesterday at 2:15 PM • 1 reply • view on HN

Replies

correct but it should be some ratio of model size like if model size is x GB, max context would occupy x * some constant of RAM. For quantized version assuming its 18GB for Q4 it should be able to support 64-128k with this mac

➕ show 1 reply

alt Hacker News

Replies