It feels like more related to this: https://blog.cloudflare.com/how-cloudflare-runs-more-ai-mode...
And doing it their way than the traditional way.
Note: Their innovation seems to lie in smaller and fine-tuned ( broader catalog) models than larger ones. So Replicate seems a perfect match.
The traditional way: Rent a GPU and run inference on an container.