Sure. Claude does that. "Cogitated for 1m 50s" doesn't work for real-time applications.
You can submit many queries in parallel to increase throughout. Smaller models and faster hardware can reduce the time per query too.
You can submit many queries in parallel to increase throughout. Smaller models and faster hardware can reduce the time per query too.