logoalt Hacker News

nickthegreekyesterday at 2:15 PM1 replyview on HN

context is always an issue with local models and consumer hardware.


Replies

pdycyesterday at 2:20 PM

correct but it should be some ratio of model size like if model size is x GB, max context would occupy x * some constant of RAM. For quantized version assuming its 18GB for Q4 it should be able to support 64-128k with this mac

show 1 reply