logoalt Hacker News

stanford_labratlast Wednesday at 5:45 PM1 replyview on HN

So I'm a biomedical scientist (in training I suppose...I'm in my 3rd year of a Genetics PhD) and I have seen this trend a couple of times now where AI developers tout that AI will accelerate biomedical discovery through a very specific argument that AI will be smarter and generate better hypotheses than humans.

For example in this Google essay they make the claim that CRISPR was a transdisciplinary endeavor, "which combined expertise ranging from microbiology to genetics to molecular biology" and this is the basis of their argument that an AI co-scientist will be better able to integrate multiple fields at once to generate novel and better hypothesis. For one, what they fail to understand as computer scientists (I suspect due to not being intimately familiar with biomedical research) is that microbio/genetics/mol bio are closer linked than you may expect as a lay person. There is no large leap between microbiology and genetics that would slow down someone like Doudna or even myself - I use techniques from multiple domains in my daily work. These all fall under the general broad domain of what I'll call "cellular/micro biology". As another example, Dario Amodei from Claude also wrote something similar in his essay Machines of Loving Grace that the limiting factor in biomedical is a lack of "talented, creative researchers" for which AI could fill the gap[1].

The problem with both of these ideas is that they misunderstand the rate-limiting factor in biomedical research. Which to them is a lack of good ideas. And this is very much not the case. Biologists have tons of good ideas. The rate limiting step is testing all these good ideas with sufficient rigor to either continue exploring that particular hypothesis or whether to abandon the project for something else. From my own work, the hypothesis driving my thesis I came up with over the course of a month or two. The actual amount of work prescribed by my thesis committee to fully explore whether or not it was correct? 3 years or so worth of work. Good ideas are cheap in this field.

Overall I think these views stem from field specific nuances that don't necessarily translate. I'm not a computer scientist, but I imagine that in computer science the rate limiting factor is not actually testing out hypothesis but generating good ones. It's not like the code you write will take multiple months to run before you get an answer to your question (maybe it will? I'm not educated enough about this to make a hard claim. In biology, it is very common for one experiment to take multiple months before you know the answer to your question or even if the experiment failed and you have to do it again). But happy to hear from a CS PhD or researcher about this.

All this being said I am a big fan of AI. I try and use ChatGPT all the time, I ask it research questions, ask it to search the literature and summarize findings, etc. I even used it literally yesterday to make a deep dive into a somewhat unfamiliar branch of developmental biology more easy (and I was very satisfied with the result). But for scientific design, hypothesis generation? At the moment, useless. AI and other LLMs at this point are a very powerful version of google and code writer. And it's not even correct 30% of the time to boot so you have to be extremely careful when using it. I do think that wasting less time exploring hypotheses that are incorrect or bad is a good thing. But the problem here is that we can pretty easily identify good and bad hypotheses already. We don't need AI for that, what takes time is the actual amount of testing of these hypotheses that slows down research. Oh and politics, which I doubt AI can magic away for us.

[1] https://darioamodei.com/machines-of-loving-grace#1-biology-a...


Replies

colingauvinlast Wednesday at 6:07 PM

It's pretty painful watching CS try to turn biology into an engineering problem.

It's generally very easy to marginally move the needle in drug discovery. It's very hard to move the needle enough to justify the cost.

What is challenging is culling ideas, and having enough SNR in your readouts to really trust them.

show 1 reply