it took me a lot of tinkering to get this feeling seamless in my own apps that use the api under the hood. i ended up buffering every token into a redis stream (with a final db save at the end of streaming) and building a mechanism to let clients reconnect to the stream on demand. no websocket necessary.
works great for kicking off a request and closing tab or navigating away to another page in my app to do something.
i dont understand why model providers dont build this resilient token streaming into all of their APIs. would be a great feature
exactly. they need to bring in spotify level of caching of streaming music that it just works if you're in a subway. Constant availability should be table stakes for them.