I am working on an app to detect tooth problems. I envision it as something you can use for a quick check for the large majority who don't have regular dental care. It will be late detection but a good alternative to doing nothing.
I am experimenting with the current SOTA multimodal LLMs, but performance is still not yet there, they still hallucinate non-existent teeth. (As an aside, I have found a simple but very telling test, I have an image with only 4 teeth visible up and 10 down, so I prompt the modal to count, non have been able to, but Gemini 2.5 pro is the closest of the lot, performance is worse in the description when the counting test fails).
I am going to try segmenting the image to see if I will have better results by prompting to describe segment by segment.