Also to add- the one service that was fast enough on the LLM side was Cerebras. The time to first token (ttft) is incredibly fast (200-300ms) and the t/s is 2000t/s for 8B- combined making for a great conversational experience.