If "small models" is the bar, then you can run inference for ~$50 on Raspberry Pi like hardware. I do that with 1.8b-4b models.
LFM 450M for vision task, QWEN 9B Q4 for Orchestration, this provides a good result.
LFM 450M for vision task, QWEN 9B Q4 for Orchestration, this provides a good result.