Except the providers also cache the parsing of the prompt (the KV cache), and that has substantial cost savings (easily an 80% saving on typical coding use cases).
That caching is done server side and not passed to the client. Which in turn means they still need state management on the server side, although it perhaps doesn't need the same level of global replication and availability.