logoalt Hacker News

rohanattoday at 6:06 PM1 replyview on HN

have you considered a deterministic tier before the embedding pass? I feel that approach can be more efficient.


Replies

rkochanowskitoday at 6:26 PM

There are good mature tools for deterministic duplication detection and I intentionally focused on embedding-based to fill this gap (I didn't find other tools using this approach).

If by "more efficient" you mean to avoid embedding of the same code multiple times, this optimization is already implemented internally.