Half-OT: Anything useful that runs reasonably fast on a regular Intel CPU/GPU?

k__ • last Saturday at 10:31 AM • 2 replies • view on HN

Replies

I did a bunch of research and basically no. Unless you can work with sending a request in the evening and getting the result in the morning.

And you'd need a lot of regular RAM because otherwise you start swapping at which point I think response times end up in days.

This tech is in the Wild West days, for it to be usable by the average person on consumer hardware, I think we'll need to be in 2030+.

ethan_smith • last Saturday at 4:49 PM

For Intel CPUs, Phi-2 (2.7B) and TinyLlama (1.1B) run reasonably well using llama.cpp with 4-bit quantization. GGUF models with INT4 quantization typically need ~2GB RAM per billion parameters, so even older machines can handle smaller models.

➕ show 1 reply

alt Hacker News

Replies