Half-OT: Anything useful that runs reasonably fast on a regular Intel CPU/GPU?
For Intel CPUs, Phi-2 (2.7B) and TinyLlama (1.1B) run reasonably well using llama.cpp with 4-bit quantization. GGUF models with INT4 quantization typically need ~2GB RAM per billion parameters, so even older machines can handle smaller models.
I did a bunch of research and basically no. Unless you can work with sending a request in the evening and getting the result in the morning.
And you'd need a lot of regular RAM because otherwise you start swapping at which point I think response times end up in days.
This tech is in the Wild West days, for it to be usable by the average person on consumer hardware, I think we'll need to be in 2030+.