> The prompt I used asks each model to return a confidence score (0 to 1) for every food item it identifies. All four models dutifully returned confidence scores for 100% of items. Surely we can use those to filter out bad estimates?
This is a problem with the companies selling the AI models, not the customers. It is their responsibility to inform consumers about the limits of their services, and to train the models to say "I don't know, there is not enough information".