logoalt Hacker News

yorwbayesterday at 8:18 PM1 replyview on HN

GPUs are more efficient than CPUs for LLM inference, using less energy per token and being cheaper overall. Yes, a single data center GPU draws a lot of power and costs a fortune, but it can also serve a lot more people in the time your CPU or consumer GPU needs to respond to a single prompt.


Replies

toleranceyesterday at 8:22 PM

I got you, thanks!