I use llama.vim with llama.cpp and the qwen2.5-coder 7B model. Easily fits on a 16 GB GPU and is fast even on a tiny RTX 2000 card with 70 watts of power. Quality of completions is good enough for me, if I want something more sophisticated I use something like Codex
I use llama.vim with llama.cpp and the qwen2.5-coder 7B model. Easily fits on a 16 GB GPU and is fast even on a tiny RTX 2000 card with 70 watts of power. Quality of completions is good enough for me, if I want something more sophisticated I use something like Codex