At least for the CPU/GPU split, llama.cpp recently added a `--fit` parameter (might default to ...

a_e_k • yesterday at 9:26 PM • 0 replies • view on HN

At least for the CPU/GPU split, llama.cpp recently added a `--fit` parameter (might default to on now?) that pairs with a `--fitc CONTEXTSIZE` parameter. That new feature will automatically look at your available VRAM and try to figure out a good CPU/GPU split for large models that leaves enough room for the context size that you request.

alt Hacker News