logoalt Hacker News

k__last Saturday at 10:31 AM2 repliesview on HN

Half-OT: Anything useful that runs reasonably fast on a regular Intel CPU/GPU?


Replies

obliolast Saturday at 11:24 AM

I did a bunch of research and basically no. Unless you can work with sending a request in the evening and getting the result in the morning.

And you'd need a lot of regular RAM because otherwise you start swapping at which point I think response times end up in days.

This tech is in the Wild West days, for it to be usable by the average person on consumer hardware, I think we'll need to be in 2030+.

ethan_smithlast Saturday at 4:49 PM

For Intel CPUs, Phi-2 (2.7B) and TinyLlama (1.1B) run reasonably well using llama.cpp with 4-bit quantization. GGUF models with INT4 quantization typically need ~2GB RAM per billion parameters, so even older machines can handle smaller models.

show 1 reply