logoalt Hacker News

kuil009today at 1:27 AM0 repliesview on HN

The positioning makes sense, but I’m still somewhat skeptical.

Targeting power, cooling, and TCO limits for inference is real, especially in air-cooled data centers.

But the benchmarks shown are narrow, and it’s unclear how well this generalizes across models and mixed production workloads. GPUs are inefficient here, but their flexibility still matters.