This may exist already, but I'd like to find a way to query 'Supplementary Material' in biomedical research papers for genes / proteins or even biological processes.
As it is, the Supplementary Materials are inconsistently indexed so a lot of insight you might get from the last 15 years of genomics or proteomics work is invisible.
I imagine this approach could work, especially for Open Access data?
I just built something like this a week ago: https://github.com/eamag/papers2dataset
I wanted to find all cryoprotective agents that were tested at different temperatures, but it should be extandable to your problem too. Uses OpenAlex to traverse a citation graph and open access pdfs