> The hardware to actually run them is the bottleneck, and it is a HUGE financial/practical bottleneck.
That's unsurprising, seeing as inference for agentic coding is extremely context- and token-intensive compared to general chat. Especially if you want it to be fast enough for a real-time response, as opposed to just running coding tasks overnight in a batch and checking the results as they arrive. Maybe we should go back to viewing "coding" as a batch task, where you submit a "job" to be queued for the big iron and wait for the results.