logoalt Hacker News

puzerlast Tuesday at 1:24 PM2 repliesview on HN

TL;DR

- The Idea: People use GitHub Stars as bookmarks. This is an excellent signal for understanding which repositories are semantically similar.

- The Data: Processed ~1TB of raw data from GitHub Archive (BigQuery) to build an interest matrix of 4 million developers.

- The ML: Trained embeddings for 300k+ repositories using Metric Learning (EmbeddingBag + MultiSimilarityLoss).

- The Frontend: Built a client-only demo that runs vector search (KNN) directly in the browser via WASM, with no backend involved.

- The Result: The system finds non-obvious library alternatives and allows for semantic comparison of developer profiles.


Replies

ameliusyesterday at 6:02 PM

This reminds me of the Netflix prize.

https://en.wikipedia.org/wiki/Netflix_Prize

ashvardanianyesterday at 6:57 PM

Cool project! And thanks for mentioning "unum-cloud/USearch" among repo examples :)