logoalt Hacker News

jychangtoday at 7:41 AM0 repliesview on HN

That's completely not true. LLM on device would use MORE electricity.

Service providers that do batch>1 inference are a lot more efficient per watt.

Local inference can only do batch=1 inference, which is very inefficient.