Good point about the unit of consumption shifting from prompts to agent loops. That makes pricing even trickier for vertical-specific AI tools.
We see this firsthand building AI Workdeck (open-source AI workspace for legal teams). A single due diligence review might chain 20+ agent calls: OCR -> text extraction -> clause classification -> risk scoring -> evidence chain assembly. The user sees one action, but the backend burns through significant inference.
The interesting thing about vertical tools is the pricing model can be fundamentally different. Horizontal tools charge per seat or per token. But in legal, the value is in the document, not the seat. A lawyer reviewing a 500-page M&A file gets way more value than one reviewing a 2-page NDA.
Self-hosting changes the calculus too. Our users run on their own infra, so the AI cost is whatever their GPU costs. That makes $1,500/month caps less relevant and throughput optimization more important.