logoalt Hacker News

a_e_kyesterday at 9:26 PM0 repliesview on HN

At least for the CPU/GPU split, llama.cpp recently added a `--fit` parameter (might default to on now?) that pairs with a `--fitc CONTEXTSIZE` parameter. That new feature will automatically look at your available VRAM and try to figure out a good CPU/GPU split for large models that leaves enough room for the context size that you request.