Time is a relatively good proxy for spend. There are also more ex post diagnostics like count and cost it can write to the status line.
I agree that ex ante it’s tough, and they could benefit from some mode of estimation.
Perhaps we can give tasks sizes, like T shirts? Or a group of claudes can spend the first 1M tokens assigning point values to the prospective tasks?
Even time doesn't feel like it would provide consistent information.
Take the response on another post about Claude Code.
https://news.ycombinator.com/item?id=47664442
This reads like even if you had a rough idea today about what usage might look like, a change deployed tomorrow could have a major impact on usage. And you wouldn't know it until after you were already using it.