I have a bunch of poor quality photos that I extracted from a pdf. Somebody I know has the good quality photo's somewhere on her computer(Mac), but it's my understanding that it will be difficult to find them.
I would like to
- loop through each poor quality photo
- perform a reverse image search using each poor quality photo as the query image and using this persons computer as the database to search for the higher quality images
- and create a copy of each high quality image in one destination folder.
Example pseudocode
for each image in poorQualityImages:
search ./macComputer for a higherQualityImage of image
copy higherQualityImage to ./higherQualityImages
I need to perform this action once. I am looking for a tool, github repo or library which can perform this functionality more so than a deep understanding of content based image retrieval.
There's a post on reddit where someone was trying to do something similar
imgdupes is a program which seems like it almost achieves this, but I do not want to delete the duplicates, I want to copy the highest quality duplicate to a destination folder
Update
Emailed my previous image processing prof and he sent me this
Off the top of my head, nothing out of the box.
No guaranteed solution here, but you can narrow the search space. You’d need a little program that outputs the MSE or SSIM similarity index between two images, and then write another program or shell script that scans the hard drive and computes the MSE between each image on the hard drive and each query image, then check the images with the top X percent similarity score.
Something like that. Still not maybe guaranteed to find everything you want. And if the low quality images are of different pixel dimensions than the high quality images, you’d have to do some image scaling to get the similarity index. If the poor quality images have different aspect ratios, that’s even worse.
So I think it’s not hard but not trivial either. The degree of difficulty is partly dependent on the nature of the corruption in the low quality images.
UPDATE
imgdupes
with the--dry-run
option to avoid deleting the images, then process the output information in a script to copy files as needed. Also I'm not sure what's the reason for thetensorflow
,keras
orpytorch
tags, please avoid using tags unrelated to the question. – Dickey