I think the real case is a future technology. Similar to speculative decoding but done over servers.
Local model answers and reaches into the cloud for hard tokens.