Hi everyone,
Careful here - algorithms that spot almost-duplicates
will happily
flag different shots from the same shoot. Definitely not something to
act upon without close human inspection.
I agree, and I wouldn't want to flag anything automatically based on
our findings.
The algorithm we use is meant to capture verbatim re-use, not
derivative works. This means that it does a very poor job at matching
images that are different photographic reproductions of the same work
(light conditions, angles, borders, etc, will all differ). It does a
fairly good job at matching images that are verbatim copies, allowing
for resizing and format changes, but it's not perfect, and we
definitely end up with the same hash for some images, even if they're
not identical. This happens often with maps, for instance. For example
two maps of US states, one marking Washington in red and one marking
California in red. With no other differences, they'll end up hashed
very close to each other.
Sincerely,
Jonas