One other thing I'd assume Anthropic is doing is routing all fast requests to the latest-gen ha...

criemen • today at 10:04 AM • 0 replies • view on HN

One other thing I'd assume Anthropic is doing is routing all fast requests to the latest-gen hardware. They most certainly have a diverse fleet of inference hardware (TPUs, GPUs of different generations), and fast will be only served by whatever is fastest, whereas the general inference workload will be more spread out.

alt Hacker News