Seems like the search is based only on the transcript/dialogue - not an image embedding. Would be super cool to actually use some CLIP/embedding search on these for a more effective fuzzy lookup.
How would someone go about doing this, just curious?
How would someone go about doing this, just curious?