How the economics of multitenancy work

186 points • by tsaifu • 05/14/2025 • 46 comments • view on HN

Comments

zdc1 • 05/14/2025

One thing I'd love to see is dynamic CPU allocation or otherwise something similar to Jenkin's concept of a flyweight runner. Certain pipelines can often spend minutes to hours using zero CPU just polling for completion (e.g. CloudFormation, hosted E2E tests, etc.) In these cases I'd be charged for 2 vCPUs but use almost nothing.

Otherwise, the customers are stuck with the same sizing/packing/utilisation problems. And imagine being the CI vendor in this world: you know which pipeline steps use what resources on average (and at the p99), and with that information you could over-provision customer jobs so that you sell 20 vCPUs but schedule them on 10 vCPUs. 200% utilisation baby!

➕ show 4 replies

shadowgovt • 05/14/2025

Interesting writeup. I wonder somewhat what this looks like from the customer side; one downside I've observed with some serverless in the past is that it can introduce up-front latency delays as the system spins up support to handle your spike. I know the CI consensus seems to be that latency matters little in a process that's going to take a long time to run to completion anyway... But I'm also a developer of CI, and that latency is painful during a tight-loop development cycle.

(The good news is that if the spikes are regular, a sufficiently-advanced serverless can "prime the pump" and prep-and-launch instances into surplus compute before the spike since historical data suggests the spike is coming).

➕ show 1 reply

TuringTest • 05/14/2025

Back in the ancient era of the mainframes, this "multitenancy" concept would have been called "time sharing".

It looks like everything old is new again.

➕ show 6 replies

Havoc • 05/14/2025

Surprised they’re doing fixed leases. I would have thought a fixed base with a layer of spot priced VMs for peaks would be more efficient on cost

➕ show 3 replies

coolcase • 05/14/2025

Was thinking about this exact thing today. Where I work combining X services from their own scaling sets to pack them together into a kubernetes cluster (or similar tech) should "smooth out" the spikes relatively and reduce wastage and also need to scale. This is on cloud so no fixed hardware concern but even then it helps with reserve instances, discounts and keeping cost down generally. This was intuition but I might math the maths on it now inspired by this.

jillesvangurp • 05/15/2025

For the last two years, I've been running github actions like this:

- start a paused vm in google cloud

- run the build there (via a gcloud ssh command) and capture the output

- pause the vm after it is done

Takes about 4-5 minutes. Maybe a few dozen times per month. It's a nice fast machine with lots of CPU and memory. Would cost a small fortune to run 24x7. It would cost more than it costs to run our entire production environment. But a few hours of build time per month barely moves the needle.

Our build and tests max out those CPUs. We only pay for the minutes it is running. Without that it would takw 2-3 times as long. And it would sometimes fail because some of our async tests time out if they take too long.

It's not the most elegant thing I've ever done but it works and hasn't failed me in the two years I've been using this setup.

But it's also a bit artificial because bare metal is cheaper and it runs 24x7. The real underlying issue is the vastly inflated price of virtual machines cloud providers rent out vs. the cost of the hardware that powers them. The physical servers pay themselves back within weeks of coming online. Everything after is pure profit.

➕ show 2 replies

solatic • 05/15/2025

Great write-up. OP's next steps are probably to offer off-peak capacity at a discount. If you pass along some of the savings to your customers, they'll happily take their once-daily or once-weekly jobs and make sure they get scheduled off-peak instead of getting scheduled at some time that was either random or arbitrary. Win-win.

mgaunard • 05/14/2025

In my experience scaling dynamically just makes things slower and it doesn't reduce costs significantly compared to having dedicated resources.

Resourcing dynamically is also difficult because you don't actually know upfront how many resources your CI needs.

➕ show 1 reply

mlhpdx • 05/14/2025

Everyone doing multi-tenant SaaS wants cost to be a sub-linear function of usage. This model of large unit capacity divided by small work units is an example of how to get there. The tough bit is that it’s stepwise at low volumes, and becomes linear at large scale, so it’s only magic during the growth phase — which is pretty solid for a growth phase company showing numbers for the next raise.

➕ show 1 reply

Aissen • 05/15/2025

> We have a fleet of hundreds of bare-metal gaming CPUs

Curious what you use and why. Larger datacenter CPUs have a steep entry price, but usually better economics. Also, don't trust the public pricing — it's totally broken in this industry, unfortunately.

0xbadcafebee • 05/14/2025

tl;dr for this particular case it's bin packing

other business cases have economics where multitenancy has (almost) nothing to do with "efficient computing", and more to do with other efficiencies, like human costs, organizational costs, and (like the other post linked in the article) functional efficiencies

__turbobrew__ • 05/15/2025

Am I reading this right, they are going to rack their own servers for this business?

If I were them I would be looking at renting from bargin bin hosting providers like hetzner or ovh to run this on. The great thing is that hetzner also has a large pool of racked servers that you can tap into.

You are basically going to re-implement hetzner at a smaller (and probably worse) scale by creating your own multitenant mini cloud for running these ci jobs.

Free advice: set up a giant kubernetes cluster on hetzner/ovh, use gvisor runtime for isolation, submit ci workloads as k8s jobs, give the k8s jobs different priority classes based on job urgency and/or some sort of credit system, jobs will naturally be executed/preempted based upon priority.

There you go, that is the product using nearly 100% existing and open source software.

➕ show 1 reply

alt Hacker News

How the economics of multitenancy work

Comments