It's not even hard, just slow. You could do that on a single cheap server (compared to a rack f...

KeplerBoy • yesterday at 12:48 PM • 0 replies • view on HN

It's not even hard, just slow. You could do that on a single cheap server (compared to a rack full of GPUs). Run a CPU llm inference engine and limit it to a single thread.

alt Hacker News