> The hardware to actually run them is the bottleneck, and it is a HUGE financial/practical ...

zozbot234 • today at 2:56 PM • 0 replies • view on HN

> The hardware to actually run them is the bottleneck, and it is a HUGE financial/practical bottleneck.

That's unsurprising, seeing as inference for agentic coding is extremely context- and token-intensive compared to general chat. Especially if you want it to be fast enough for a real-time response, as opposed to just running coding tasks overnight in a batch and checking the results as they arrive. Maybe we should go back to viewing "coding" as a batch task, where you submit a "job" to be queued for the big iron and wait for the results.

alt Hacker News