logoalt Hacker News

coffeecodersyesterday at 6:30 PM0 repliesview on HN

I have been building something like this but for personal use.

As of now, I use SentenceTransformer model to chunk files, blip for captioning (“Family vacation in Banff, February 2025”)) and mtcnn with InsightFace for face detection. My index stores captions, face embeddings, and EXIF metadata (date, GPS) for queries like “show photos of us in Banff last winter.” I’m working on integrating ChromaDB for faster searches.

Eventually, I aim to store indexes as:

{

  "filename": "/Vacation/Banff/Wife.jpg",

  "chunk_id": 0,

  "text": "Family at Banff, February 2025",

  "caption_embedding": [0.1, 0.2, ...],

  "face_embeddings": [{"name": "NT", "embedding": [0.3, 0.4, ...]}, ...],

  "exif": {
     
     "DateTimeOriginal": "2025:02:15",

     "GPSCoordinates": "18.387, -65.992"

    }
}

I also built an UI (like Spotlight Search) to search through these indexes.

Code (in progress): https://github.com/neberej/smart-search