Amazon is the same I think? I live in constant fear we will have a runaway job one day. I get daily emails to myself (as a manager) and to my finance person. We had one instance where a team member forgot to turn off a machine for a few months :(
I get why it is a business strategy to not have limits .. but I wonder if providers would get more usage if people had more trusts on costs/predictability.
There's a coarse option: Set up a budget and then a budget action. While ECS doesn't have GPU capabilities, the equivalent here would be "IAM action of budget sets deny on expensive service IAM action" (SCP is also available, but that requires an AWS Org, at which point you've probably got a team that already knows this)
It's coarse because it's daily and not hourly. However, you could also self-service do some of this with CloudWatch metrics to map to a cost and then have an alarm action.
https://aws.amazon.com/blogs/mt/manage-cost-overruns-part-1/
> I get why it is a business strategy to not have limits...
What is the strategy? Is is purely market segmentation? (As in: "If you need to worry about spending too much, you're not the big-money kind of enterprise customer we want"?)
I remember going out to dinner, years ago, with a fairly senior AWS billing engineer. An acquaintance of a coworker.
He looked completely surprised when I asked about runaway billing and why there wasn't any simple options to cap a given resource to prevent those cases.
His response was that they didn't build that because none of their customers wanted anything like that, as far as he was aware.