This may be relevant for parallelizable workloads. For reference on my perspective: I come at this as someone who is exclusively concerned with sequential, non-parallelizable, single-user, single-system workloads.
If you have multiple chats going at the same time in your LLM web interface, that's already a parallelizable workload wrt. batched inference. And this broadly describes the more sophisticated users of LLMs (who are using it for more than just casual chit-chat), especially wrt. the largest "pro" models. Parallelism is also quite applicable to agentic workloads.
If you have multiple chats going at the same time in your LLM web interface, that's already a parallelizable workload wrt. batched inference. And this broadly describes the more sophisticated users of LLMs (who are using it for more than just casual chit-chat), especially wrt. the largest "pro" models. Parallelism is also quite applicable to agentic workloads.