I'm sensing in reality that behind the scenes there is a difficult trade-off between quantization and usage limits. You can have a "smart" model but poor limits, or good limits and a "dumb" model.
This seems very similar to mobile data limits (remember those years?), where there wasn't enough tower bandwidth to serve everyone unlimited data, so telecos were in constant tension between data caps and bandwidth throttling.
It wasn't until 5G came along with 100x network capacity that they could finally give everyone "unlimited" data.