The did explain a little bit:
> We’ll be able to do things like run fast models on the edge, run model pipelines on instantly-booting Workers, stream model inputs and outputs with WebRTC, etc.
Benefit to 3rd party developers is reducing latency and improving robustness of AI pipeline. Instead of going back and forth with https request at each stage to do inference you could make all in one request, e.g. doing realtime, pipelined STT, text translation, some backend logic, TTS and back to user mobile device.
Does edge inference really solve the latency issue for most use cases? How does cost compare at scale?
You are seemingly answering something that they did not ask at all