This is a great use case for embeddings. Code deduplication across distant modules is notoriously hard for traditional AST-based tools.
How do you handle chunking and parsing for different languages to make sure the embeddings capture semantic meaning effectively? For instance, do you chunk by functions/classes, or use a fixed token window? If a function is too long or too short, it can drastically skew the embedding similarity.