I've checked methods like Phasher to get similar images. Basically to resize images to 8x8, grayscale, get average pixel and create a binary hash of each pixel comparing if it's above or below the average pixel.
This method is very well explained here: http://hackerfactor.com/blog/index.php?/archives/432-Looks-Like-It.html
Example working: - image 1 of a computer on a table - image 2, the same, but with a coin
This would work, since, using the hash of a very reduced, grayscale image, both of them will be almost the same, or even the same. So you can conclude they are similar when 90% of more of the pixels are the same (in the same place!)
My problem is in images that are taken from the same point of view but different angle, for example this ones:
In this case, the hashes "fingerprint" generated are so shifted each other, that we can not compare the hashes bit by bit, it will be very different.
The pixels are "similar", but they are not in the same place, since in this case there's more sky, and the houses "starts" more below than the first one.
So the hash comparison results in "they are different images".
Possible solution:
I was thinking about creating a larger hash for the first image, then get 10 random "sub hashes" for the second one, and try to see if the 10 sub hashes are or are not in "some place" of the first big hash (if a substring is contained into another bigger).
Problem here I think is the CPU/time when working with thousands of images, since you have to compare 1 image to 1000, and in each one, compare 10 sub hashes with a big one.
Other solutions ? ;-)