logoalt Hacker News

Launch HN: Tamarind Bio (YC W24) – AI Inference Provider for Drug Discovery

75 pointsby denizkavilast Tuesday at 5:49 PM17 commentsview on HN

Hi HN, we're Deniz and Sherry from Tamarind Bio (https://www.tamarind.bio). Tamarind is an inference provider for AI drug discovery, serving models like AlphaFold. Biopharma companies use our library of leading open-source models to design new medicines computationally.

Here’s a demo: https://youtu.be/luoMApPeglo

Two years ago, I was hired at a Stanford lab to run models for my labmates. Some post-doc would ask me to run a set of 1-5 models in sequence with tens of thousands inputs and I would email them back the result after setting up the workflow in the university cluster.

At some point, it became unreasonable that all of an organization's computational biology work would go through an undergrad, so we built Tamarind as a single place for all molecular AI tools, usable at massive scale with no technical background needed. Today, we are used by much of the top 20 pharma, dozens of biotechs and tens of thousands of scientists.

When we started getting adoption in the big pharma companies, we found that this problem also persisted. I know directors of data science, where half their job could be described as running scripts for other people.

Lots of companies have also deprecated their internally built solution to switch over, dealing with GPU infra and onboarding docker containers not being a very exciting problem when the company you work for is trying to cure cancer.

Unlike non-specialized inference providers, we build both a programmatic interface for developers along with a scientist-friendly web app, since most of our users are non-technical. Some of them used to extract proteins from animal blood before replacing that process with using AI to generate proteins on Tamarind.

Besides grinding out images for each of the models we serve, we’ve designed a standardized schema to be able to share each model’s data format. We’ve built a custom scheduler and queue optimized for horizontal scaling (each inference call takes minutes to hours, and runs on one GPU at a time), while splitting jobs across CPUs and GPUs for optimal timing.

As we've grown to handle a substantial portion of the biopharma R&D AI demand on behalf of our customers, we've expanded beyond just offering a library of open source protocols.

A common use case we saw from early on was the need to connect multiple models together into pipelines, and having reproducible, consistent protocols to replace physical experiments. Once we became the place to build internal tools for computational science, our users started asking if they could onboard their own models to the platform.

From there, we now support fine-tuning, building UIs for arbitrary docker containers, connecting to wet lab data sources and more!

Reach out to me at deniz[at]tamarind.bio if you’re interested in our work, we are hiring! Check out our product at https://app.tamarind.bio and let us know if you have any feedback to support how the biotech industry uses AI today.


Comments

washedDeveloperlast Tuesday at 7:23 PM

The org I work on develops HTCondor. We have a lot of scientists that end up running alphafold and other bio related models on our pool of GPUs and CPUs. I am curious to know how and why your team implemented yet another job scheduler. HTCondor is agnostic to the software being ran, so maybe there is more clever scheduling you can come up with. That being said, HTCondor also has pretty high flexibility with regards to policy.

show 1 reply
brandonblast Tuesday at 6:03 PM

Congrats on the launch. I always love to see smart ML founders applying their talents to health and bio.

What were the biggest challenges in getting major pharma companies onboard? How do you think it was the same or different compared to previous generations of YC companies (like Benchling)?

show 1 reply
the__alchemistlast Tuesday at 7:33 PM

Cool project! I have a question based on the video: What sort of work is it doing from the "Upload mmCIF file and specify number of molecules to generate" query? That seems like a broad ask. For example, it is performing ML inference on a data set of protein characteristics, or pockets in that protein? Using a ligand DB, or generating ligands? How long does that run take?

show 1 reply
conradryyesterday at 4:52 AM

You may find this library I wrote a couple years ago interesting: https://github.com/conradry/prtm. Curious about why you chose to make separate images for each model instead of copy-pasting source code into a big monorepo (similar to Huggingface transformers).

show 1 reply
Akshay0308last Tuesday at 6:50 PM

That's really cool! How much do scientists at big pharma use open-source models as opposed to models trained on their proprietary data? Do you guys have tie-ups to provide inference for models used internally at big pharma trained on proprietary data?

show 1 reply
machbiolast Tuesday at 6:07 PM

Looks good - would have really appreciated if the pricing page contained any examples of pricing instead of book a meeting

show 1 reply
johnsillingslast Tuesday at 11:44 PM

selling to big pharma companies as a startup is hard, so huge props on getting adoption there. the product looks very slick.

t_serpicolast Tuesday at 7:40 PM

nice stuff! how do you handle security concerns big pharma may have? wouldn't they just run their stuff on-prem?

show 1 reply