Image Search using Image Similarity Measures [closed]

Asked 25/10, 2020 at 6:23 Answered 10/5, 2022 at 9:19

python image-processing computer-vision opencv

I am working on a project which is image searching engine. The logic behind is that there will be some images stored in a database and the user will input a new image and it will be matched with the ones stored in the database. The result will be a list of closest match of the query image with stored images in the database.

The images are of stamps. Now the problem is that there are New and Used stamps. New is just a stamp image and Used is with a some part of it obscured by a black cancellation mark so it cannot be a perfect match.

Here are few sample of both (New and Used):

I have used various measures, such as compare_mse, compare_ssim and compare_nrmse. But they all tilt towards dissimilarity. I have also used the https://github.com/EdjoLabs/image-match algorithm but it also is same giving low similarity score.

Do you guys think I neeed to use some preprocessing or something? I have also removed black borders from the image but the result is somewhat better not satisfactory though. I have converted them into gray-scale and matched, still no satisfactory results. Any recommendations and suggestion on how to get high similarity scores would be greatly appreciated! Here is my code:

img1 = cv2.imread('C:\\Users\\Quin\\Desktop\\1frclean2.jpg', cv2.IMREAD_GRAYSCALE)
img2 = cv2.imread('C:\\Users\\Quin\\Desktop\\1fr.jpg', cv2.IMREAD_GRAYSCALE)
compare_mse(cv2.resize(img1, (355, 500)), cv2.resize(img2, (355, 500)))
compare_ssim(cv2.resize(img1, (355, 500)), cv2.resize(img2, (355, 500)))

MSE returned 4797.232123943662 and SSIM returned 0.2321816144043102.

Bast answered 25/10, 2020 at 6:23 Comment(7)

Please repeat on topic and how to ask from the intro tour. You're asking for a general discussion of strategies; this is too broad, opinion-based, and thus off-topic here. You have presented no code, no hard data, merely a vague dissatisfaction with your model performance so far. – Niagara 25/10, 2020 at 6:31

You seem to have simply thrown canned metrics at the problem, rather than dealing with the factors you've identified. "I read a few articles", and you have some ideas, but you haven't tried those yet. Model-building is generally about analysis and experimentation for your specific application. It's time to do the experiments you hint at in your post. – Niagara 25/10, 2020 at 6:32

I have done all the measures and experiments but similarity score is low! – Bast 25/10, 2020 at 6:39

Please provide the expected MRE. Show where the intermediate results deviate from the ones you expect. We should be able to paste a single block of your code into file, run it, and reproduce your results. This also lets us test any suggestions in your context. – Niagara 25/10, 2020 at 6:45

No, you have not done all the measures and experiments. When you have addressed the issues in the links I provided and in my earlier comments, you may have a Stack Overflow question. For now, you're simply on the wrong site. – Niagara 25/10, 2020 at 6:46

I suspect you will need an AI / Deep Learning approach with lots of training data. – Uttasta 25/10, 2020 at 6:47

Give the DCT (Discrete Cosine Transform) a shot. It is normally used as a operation in image compression, but it can also be used to measure image similarity. – Relic 25/10, 2020 at 17:19

MSE and SSIM are not good for the problem, as they aim at pixel by pixel comparison. Here is an article by NVIDIA showing, for example, how SSIM dramatically fails for even simple cases.

Here is a collection of research papers on image similarity to understand what is possible.

Two ideas that could be relevant:

Reverse search with embeddings.
Since stamps usually contain text, you might use some vision cloud service to recognize/OCR it, and then compare the text strings for similarity.

Also here is a related Stackoverflow question with a solution.

Costa answered 28/10, 2020 at 2:35 Comment(2)

Did you try the image embedding aproach? I'm wondering how it'd perform with small datasets. – Maidinwaiting 3/5, 2021 at 19:49

@MiguelRueda: No, I have not. – Costa 7/12, 2021 at 21:29

The classical approach:

normalize the images in position, size and angle, as much as possible; for this, you will need a reliable way to find the outer edges and/or the corners; as the aspect ratios can differ, you can normalize the height and adjust the width accordingly;
to recognize a given sample, normalize it and compare to the reference images with a width that is compatible; you can use the normalized grayscale correlation, which is not sensitive to changes in illumination; you make it sensitive to colors by handling the three color planes independently;
you can work at full scale or reduced scale (for speed); or at reduced scale, keep the best candidates and discriminate at a higher scale;
it can also be useful to try a few neighboring positions and keep the best, to deal with imperfect alignment.

To set an acceptance threshold on the matching score, you can test with a few sample, look at matching scores for the same and for different stamps. Hopefully, the worst score for matching stamps (even with a cancellation mark) will be better than the best score for mismatching stamps.

The "modern approach": deep learning.

Drily answered 10/5, 2022 at 9:19 Comment(0)

Recommended topics

Hot tags