I don't know. This paper [1] reports accuracies in the 97-98% range on a similar task with more powerful models. With Gemma 2 2b the accuracy will certainly be lower.
[1] https://www.medrxiv.org/content/10.1101/2024.10.01.24314702v...
> I don't know.
HN in a nutshell: I've built some cool tech but have no idea if it is helpful or even counter productive...
Y'all definitely need to cross validate a small number of samples by hand. When I did this kind of research, I would hand validate to at least P < .01.