Your choices with the OpenCV library is to use any number of methods to select a few points, and create the transformation between those points in the image by using a function like getAffineTransform
or getPerspectiveTransform
. Note that functions like these take points as arguments, not luminosity values (images). You'll want to find points of interest in the first image (say, those marker spots); and you'll want to find those same points in the second image, and pass those pixel locations to a function like getAffineTransform
or getPerspectiveTransform
. Then, once you have that transformation matrix, you can use warpAffine
or warpPerspective
to warp the second image into the coordinates of the first (or vice versa).
Affine transformations include translation, rotation, scaling, and shearing. Perspective transformations include everything from affine transformations, as well as perspective distortion in the x
and y
directions. For getAffineTransform
you need to send three pairs of points from the first image, and where those three same pixels are located in the second image. For getPerspectiveTransform
, you will send four pixel pairs from each image. If you want to use all of your marker points, you can use findHomography
instead which will allow you to place more than four points and it will compute an optimal homography between all of your matched points.
When you use feature detection and matching to align images, it's using these functions in the background. The difference is it finds the features for you. But if that's not working, simply use manual methods to find the features to your liking, and then use these methods on those feature points. E.g., you could find the template locations as you already have and define that as a region of interest (ROI), and then break the marker points into smaller template pieces and find those locations inside your ROI. Then you have corresponding pairs of points from both images; you can input their locations into findHomography
or just pick three to use with getAffineTransform
or four with getPerspectiveTransform
and you'll get your image transformation which you can then apply.
Otherwise you'll need to use something like the Lukas-Kanade optical flow algorithm which can do direct image matching if you don't want to use feature-based methods, but these are incredibly slow comparatively to selecting a few feature points and finding homographies that way if you use the whole image. However if you only have to do it for a few images, it's not such a huge deal. To be more accurate and have it converge much faster, it'll help if you can provide it a starting homography that at least translates it roughly to the right position (e.g. you do your feature detection, see that the feature is roughly (x', y')
pixels in the second image from the first, and create an homography with that translation).
You can also likely find some Python routines for homography estimation from the Lucas-Kanade inverse compositional algorithm or the like online if you want to try that. I have my own custom routine for that algorithm as well, but I can't share it, however, I could run the algorithm on your images if you share the originals without the bounding boxes, to maybe provide you with some estimated homographies to compare with.