And that's exactly why llama.cpp is not usable by casual users. They follow the "move fast and break things" model. With ollama, you just have to make sure you're getting/building the latest version.
Its not possible to run the latest model architectures without 'moving fast'. The only thing broken here is that they are trying to use an old version with a new model.
Its not possible to run the latest model architectures without 'moving fast'. The only thing broken here is that they are trying to use an old version with a new model.