logoalt Hacker News

alejandrorivastoday at 7:56 PM0 repliesview on HN

iNaturalist's computer vision model is actually trained on the community's own verified observations, creating a nice feedback loop. The current model (built on a vision transformer architecture) can suggest IDs for around 76,000 taxa, but it's retrained periodically as more research-grade observations come in. What's less well known is that their training dataset is publicly available on GitHub and has become a standard benchmark in fine-grained visual classification research, used in papers from Google, Meta, etc. The fact that a citizen science platform accidentally produced one of the most important biodiversity ML datasets is kind of remarkable.