You can submit many queries in parallel to increase throughout. Smaller models and faster hardware can reduce the time per query too.
Ok. Claude will not work for this use case because none of the sample data (weirdly blurry ID images) is in the training data.
None of that gets you the 100ms response time the parent poster talked about, for something like "who is at my doorbell?" real-time uses.