logoalt Hacker News

KeplerBoyyesterday at 12:48 PM0 repliesview on HN

It's not even hard, just slow. You could do that on a single cheap server (compared to a rack full of GPUs). Run a CPU llm inference engine and limit it to a single thread.