logoalt Hacker News

Gracanatoday at 11:30 AM0 repliesview on HN

You can do it slowly with ik_llama.cpp, lots of RAM, and one good GPU. Also regular llama.cpp, but the ik fork has some enhancements that make this sort of thing more tolerable.