I’ve been using the ollama version (uses about 13 Gb RAM on macOS) and haven’t had that issue yet. I wonder if that’s maybe an issue of the llama.cpp port?
Never used ollama, only ready to go models via llamafile and llama.cpp.
Maybe ollama has some defaults it applies to models? I start testing models at 0 temp and tweak from there depending how they behave.
Never used ollama, only ready to go models via llamafile and llama.cpp.
Maybe ollama has some defaults it applies to models? I start testing models at 0 temp and tweak from there depending how they behave.