logoalt Hacker News

butlikelast Monday at 1:07 PM3 repliesview on HN

The path is different than the filename though. If I want to find duplicates, it will be impossible if the filename changes. In my use case

/User/user/Images/20240110/happy_birthday.jpg

and

/User/user/Desktop/happy_birthday.jpg

are the same image.


Replies

dns_sneklast Monday at 1:14 PM

> it will be impossible if the filename changes.

Not impossible, just different and arguably better - comparing hashes is a better tool for finding duplicates.

show 2 replies
tart-lemonadelast Monday at 2:18 PM

If your camera (or phone) uses the DCF standard [0], you will eventually end up with duplicates when you hit IMG_9999.JPG and it loops around to IMG_0001.JPG. Filename alone is an unreliable indicator.

[0]: https://en.wikipedia.org/wiki/Design_rule_for_Camera_File_sy...

show 2 replies
adolphlast Monday at 4:53 PM

> If I want to find duplicates, it will be impossible if the filename changes.

Depends on what is meant by a "duplicate." It would be a good idea to get a checksum of the file, which can detect exact data duplicates, but not something where metadata is removed or if the image was rescaled. Perceptual hashing is more expensive but is better distinguish matches between rescaled or cropped images.

https://en.wikipedia.org/wiki/Perceptual_hashing