If the images are really nearly identical, and are simply translated (i.e. not skewed, rotated, scaled, etc), you could try using cross-correlation.
When you cross-correlate an image with itself (this is the auto-correlation), the maximum value will be at the center of the resulting matrix. If you shift the image vertically or horizontally and then cross-correlate with the original image the position of the maximum value will shift accordingly. By measuring the shift in the position of the maximum value, relative to the expected position, you can determine how far an image has been translated vertically and horizontally.
Here's a toy example in python. Start by importing some stuff, generating a test image, and examining the auto-correlation:
import numpy as np
from scipy.signal import correlate2d
# generate a test image
num_rows, num_cols = 40, 60
image = np.random.random((num_rows, num_cols))
# get the auto-correlation
correlated = correlate2d(image, image, mode='full')
# get the coordinates of the maximum value
max_coords = np.unravel_index(correlated.argmax(), correlated.shape)
This produces coordinates max_coords = (39, 59)
. Now to test the approach, shift the image to the right one column, add some random values on the left, and find the max value in the cross-correlation again:
image_translated = np.concatenate(
(np.random.random((image.shape[0], 1)), image[:, :-1]),
axis=1)
correlated = correlate2d(image_translated, image, mode='full')
new_max_coords = np.unravel_index(correlated.argmax(), correlated.shape)
This gives new_max_coords = (39, 60)
, correctly indicating the image is offset horizontally by 1 (because np.array(new_max_coords) - np.array(max_coords)
is [0, 1]
). Using this information you can shift images to compensate for translation.
Note that, should you decide to go this way, you may have a lot of kinks to work out. Off-by-one errors abound when determining, given the dimensions of an image, where the max coordinate 'should' be following correlation (i.e. to avoid computing the auto-correlation and determining these coordinates empirically), especially if the images have an even number of rows/columns. In the example above, the center is just [num_rows-1, num_cols-1]
but I'm not sure if that's a safe assumption more generally.
But for many cases -- especially those with images that are almost exactly the same and only translated -- this approach should work quite well.