OpenCV: Strange rotation and translation matrices from decomposeHomographyMat
Asked Answered
C

2

5

I am trying to solve a plane2plane projection problem using the findHomography function found in the OpenCV library. As a toy example I have a set of points in R2, P, and a second set of points in R2, Q, where Qx = Px+50, Qy = Py. Meaning I've offset the x-coordinates by 50. Now I run the following code:

Mat projectionMatrix = findHomography(Q, P);
vector<Point2f> projectedPoints(objectCoordinates.size());
perspectiveTransform(Q, projectedPoints, projectionMatrix);

And that gives me P, which is great. However, now I would like the rotation and translation matrices, R & T, and this is where I get confused. OpenCV 3 has a function called decomposeHomographyMat that returns up to 4 solutions for R and T (also returns the normals but I do not store those). I run it as such:

vector<Mat> Rs;
vector<Mat> Ts;
decomposeHomographyMat(projectionMatrix, cameraMatrix, Rs, Ts, noArray());

The cameraMatrix I use is tried and tested from previous experiments. Ok, so I get my four results. Looking at them I notice that I get the identity matrix as a result for all R which is great. However, all of the translation vectors are [0,0,0]T, whereas I would expect at least one of them to be [-50,0,0]T. Is there something I need to do with the result from decomposeHomographyMat to get the expected behaviour?

Thanks

Component answered 11/3, 2016 at 13:56 Comment(3)
Have you verified that the computed homography is not zero?Extrorse
@Extrorse Yes, The computed homography is (with some very minor rounding performed): [ 1 0 -50 ; 0 1 0 ; 0 0 1].Component
This is just a guess, maybe decomposeHomographyMat does not handle the special case of an affine homography?Extrorse
N
15

It turns out I was wrong in some points, so I have decided to rewrite this answer.

In brief - you are getting strange results because of incorrect intrinsic parameter matrix.

Using the terminology from the paper "Malis, E and Vargas, M, "Deeper understanding of the homography decomposition for vision-based control" (on which the homography decomposition in OpenCV is based), the perspective transformation is denoted by H, and called a Euclidean homography matrix, and result of its normalization G = K^-1 * H * K (where K is camera's calibration matrix) is called a homography matrix

Both cv::findHomography() and cv::decomposeHomographyMat() work with Euclidean homography matrix. But in order to decompose it into translation and rotation, cv::decomposeHomographyMat() normalizes Euclidean homography matrix to obtain homography matrix. It relies on K, provided from user, in order to perform this normalization.

As for the estimation of K, I think this is beyond the scope of this question. This problem is called Camera auto-calibration, here's a relevant quote from this wiki article:

Therefore, three views are the minimum needed for full calibration with fixed intrinsic parameters between views. Quality modern imaging sensors and optics may also provide further prior constraints on the calibration such as zero skew (orthogonal pixel grid) and unity aspect ratio (square pixels). Integrating these priors will reduce the minimal number of images needed to two.

It seems, that you can extract the K from image correspondances over 2 frames from the same camera under the zero skew and square pixel assumptions. But, I am not familiar with this subject, so can't give you more suggestions.

So, to check if my interpretation is correct, I have made a small example, that projects some points on a plane in 3D on 2 virtual cameras, finds a homography, decomposes it, and allows comparing this decomposition to ground truth rotation and translation vectors. This is better than real inputs, because this way we know the K precisely, and can decouple error in its estimation from error in R and t. For inputs I have checked it was able to correctly estimate rotation and translation vectors, although for some reason translation is always smaller than ground truth 10 times. Maybe this decomposition is defined only up to a scale (I'm not sure now), but it is interesting, that it is related to the ground truth value with a fixed coefficient.

Here's the source:

#include <opencv2/opencv.hpp>
#include <iostream>
#include <vector>


int main() {
  // set up a virtual camera
  float f = 100, w = 640, h = 480;

  cv::Mat1f K = (cv::Mat1f(3, 3) <<
      f, 0, w/2,
      0, f, h/2,
      0, 0,   1);

  // set transformation from 1st to 2nd camera (assume K is unchanged)
  cv::Mat1f rvecDeg = (cv::Mat1f(3, 1) << 45, 12, 66);
  cv::Mat1f t = (cv::Mat1f(3, 1) << 100, 200, 300);

  std::cout << "-------------------------------------------\n";
  std::cout << "Ground truth:\n";

  std::cout << "K = \n" << K << std::endl << std::endl;
  std::cout << "rvec = \n" << rvecDeg << std::endl << std::endl;
  std::cout << "t = \n" << t << std::endl << std::endl;

  // set up points on a plane
  std::vector<cv::Point3f> p3d{{0, 0, 10},
                               {100, 0, 10},
                               {0, 100, 10},
                               {100, 100, 10}};

  // project on both cameras
  std::vector<cv::Point2f> Q, P, S;

  cv::projectPoints(p3d,
                    cv::Mat1d::zeros(3, 1),
                    cv::Mat1d::zeros(3, 1),
                    K,
                    cv::Mat(),
                    Q);

  cv::projectPoints(p3d,
                    rvecDeg*CV_PI/180,
                    t,
                    K,
                    cv::Mat(),
                    P);

  // find homography
  cv::Mat H = cv::findHomography(Q, P);

  std::cout << "-------------------------------------------\n";
  std::cout << "Estimated H = \n" << H << std::endl << std::endl;


  // check by reprojection
  std::vector<cv::Point2f> P_(P.size());
  cv::perspectiveTransform(Q, P_, H);

  float sumError = 0;

  for (size_t i = 0; i < P.size(); i++) {
    sumError += cv::norm(P[i] - P_[i]);
  }

  std::cout << "-------------------------------------------\n";
  std::cout << "Average reprojection error = "
      << sumError/P.size() << std::endl << std::endl;


  // decompose using identity as internal parameters matrix
  std::vector<cv::Mat> Rs, Ts;
  cv::decomposeHomographyMat(H,
                             K,
                             Rs, Ts,
                             cv::noArray());

  std::cout << "-------------------------------------------\n";
  std::cout << "Estimated decomposition:\n\n";
  std::cout << "rvec = " << std::endl;
  for (auto R_ : Rs) {
    cv::Mat1d rvec;
    cv::Rodrigues(R_, rvec);
    std::cout << rvec*180/CV_PI << std::endl << std::endl;
  }

  std::cout << std::endl;

  std::cout << "t = " << std::endl;
  for (auto t_ : Ts) {
    std::cout << t_ << std::endl << std::endl;
  }

  return 0;
}

And here's the output on my machine:

-------------------------------------------
Ground truth:
K =
[100, 0, 320;
0, 100, 240;
0, 0, 1]

rvec =
[45;
12;
66]

t =
[100;
200;
300]

-------------------------------------------
Estimated H =
[0.04136041220427821, 0.04748763375951008, 358.5557917287962;
0.05074854454707714, 0.06137211243830468, 297.4585754092336;
8.294458769850147e-05, 0.0002294875562580223, 1]

-------------------------------------------
Average reprojection error = 0

-------------------------------------------
Estimated decomposition:

rvec =
[-73.21470385654712;
56.64668212487194;
82.09114210289061]

[-73.21470385654712;
56.64668212487194;
82.09114210289061]

[45.00005330430893;
12.00000697952995;
65.99998380038915]

[45.00005330430893;
12.00000697952995;
65.99998380038915]


t =
[10.76993852870029;
18.60689642878277;
30.62344129378435]

[-10.76993852870029;
-18.60689642878277;
-30.62344129378435]

[10.00001378255982;
20.00002581449634;
30.0000336510648]

[-10.00001378255982;
-20.00002581449634;
-30.0000336510648]

As you can see, there is correct estimation of rotation vector among the hypothesis, and there's a up-to-scale correct estimation of translation.

Nitrobacteria answered 11/3, 2016 at 14:51 Comment(10)
Thank you for a thorough and informative answer! I've read some of the reference you supplied and I think I follow the general concepts. I am still unsure over the role of K as that seems to be completely ignored. However, when I apply the workflow above to two pictures with 30 degree rotation of the camera and marked points, my results are bad. Again, the perspective transform using the estimated H gives a perfect result. But the decomposition yields bad results. My guess is that I am missing some kind of normalization of my points when applying R and t but cannot figure out what.Component
I'll elaborate a little. I've manually placed 12 markers in the overlapping parts of two scenes where the camera was rotated 30 degrees around the y axis. Since the camera was rotated around a single axis I had imagined that I would get a R showcasing the rotation and a t = 0. I am, however, seeing fairy large translations and just nonsense rotations.Component
Were the point matches evenly distributed over the image area? Because when you observe a small part of rotating object - it can be locally described with a translational motion.Nitrobacteria
Also, now I am not so sure about the fact that K matrix should be identity when you use cv::findHomography() to estimate the transformation. The coordinated are indeed normalized under the hood, but the normalization is stored in matrix form, and its inverse is reapplied to estimated transformation before returning the result. So it seems that cv::findHomography() returns Euclidean homography matrix. But unfortunately I cannot figure out how to find corresponding K matrix in that case.Nitrobacteria
I have glanced across this paper ais.informatik.uni-freiburg.de/teaching/ws11/robotics2/pdfs/… It seems that you can obtain K from a Euclidean homography matrix using Cholesky factorization. If I'll have some time in the evening - I'll try it and post a snippet.Nitrobacteria
I've tried to spread out my points over the entire overlapping section so that should not be the issue. If it's any help I have a calibration value for K from other experiments so I don't really need to extract K from H. I just can't figure out why my results are off. I'll post my datapoints below.Component
P: 454,996;273,714;604,696;879,679;341,563;479,508;521,491;1026,443;541,373;542,320;589,317;263,129 P* 1246,979;1088,697;1392,699;1704,698;1144,562;1270,510;1310,493;1905,426;1331,377;1331,324;1378,317;1083,171 K [1460 0 960;0 1460 540;0 0 1]Component
I have edited the answer, and although it doesn't completely solve your problem - I hope is still may be helpfulNitrobacteria
Thank you very much for sharing this informative answer. "there's a up-to-scale correct estimation of translation", I would like to ask you if it is possible get the real scale of transformation ?Alodium
may I kindly ask you to elaborate the camera coordinate frames in your sample code? I do not understand what 45, 12, 66 corresponds to? are they camera rotations along x(right), y(downward), z(forward) respectively? besides, does this virtual camera follow right hand rule or left hand one?Provided
C
2

You should use solvePnP instead on each plane and then get the relative translation and rotation from the two camera matrices (assuming you have at least 4 points). The problem with decomposeHomographyMat is that you can get up to 4 solutions... You can remove two that will land out of the image FOV, but this is still a problem.

  • @alexisrozhkov- I've found the error in your code- The article and the function both assume that the Z plane is at 1 (not z=10 like you had). Added below is a correct re implantation of your above code (fixed and in python):
import cv2
import numpy as np


# set up a virtual camera
f = 100
w = 640
h = 480

K = np.array([[f, 0, w/2],
              [0, f, h/2],
              [0, 0,   1]])
dist_coffs = np.zeros(5)
# set transformation from 1st to 2nd camera (assume K is unchanged)
rvecDeg = np.array([[45, 12, 66]]).T
t = np.array([[100.0, 200, 300]]).T

print("-------------------------------------------\n")
print("Ground truth:\n")

print("K = \n" + str(K))
print("rvec = \n" + str(rvecDeg))
print("t = \n" + str(t))

# set up points on a plane
p3d = np.array([[0, 0, 1],
                [100, 0, 1],
                [0, 100, 1],
                [100, 100, 1]], dtype=np.float64)

# project on both cameras

Q, _ = cv2.projectPoints(p3d,
                         np.zeros((3, 1)),
                         np.zeros((3, 1)),
                         K,
                         dist_coffs)

P, _ = cv2.projectPoints(p3d,
                         rvecDeg*np.pi/180,
                         t,
                         K,
                         dist_coffs)

# find homography
H, _ = cv2.findHomography(Q, P)

print("-------------------------------------------\n")
print("Estimated H = \n" + str(H))


# check by reprojection
P_ = cv2.perspectiveTransform(Q, H)

sumError = 0

for i in range(P.shape[0]):
    sumError += np.linalg.norm(P[i] - P_[i])


print("-------------------------------------------\n")
print("Average reprojection error = "+str(sumError/P.shape[0]))


# decompose using identity as internal parameters matrix
num_res, Rs, ts, n = cv2.decomposeHomographyMat(H, K)

print("-------------------------------------------\n")
print("Estimated decomposition:\n\n")
for i, Rt in enumerate(zip(Rs, ts)):
    R, t = Rt
    print("option " + str(i+1))
    print("rvec = ")
    rvec, _ = cv2.Rodrigues(R)
    print(rvec*180/np.pi)
    print("t = ")
    print(t)

and this is the result (option 3 came out as the correct result):

-------------------------------------------

Ground truth:

K = 
[[100.   0. 320.]
 [  0. 100. 240.]
 [  0.   0.   1.]]
rvec = 
[[45]
 [12]
 [66]]
t = 
[[100.]
 [200.]
 [300.]]
-------------------------------------------

Estimated H = 
[[3.93678513e-03 4.51998690e-03 3.53830416e+02]
 [4.83037187e-03 5.84154353e-03 3.05790229e+02]
 [7.89487379e-06 2.18431675e-05 1.00000000e+00]]
-------------------------------------------

Average reprojection error = 1.1324020050061362e-05
-------------------------------------------

Estimated decomposition:


option 1
rvec = 
[[-78.28555877]
 [ 58.01301837]
 [ 81.41634175]]
t = 
[[100.79816988]
 [198.59277542]
 [300.6675498 ]]
option 2
rvec = 
[[-78.28555877]
 [ 58.01301837]
 [ 81.41634175]]
t = 
[[-100.79816988]
 [-198.59277542]
 [-300.6675498 ]]
option 3
rvec = 
[[45.0000205 ]
 [12.00005488]
 [65.99999505]]
t = 
[[100.00010826]
 [200.00026791]
 [300.00034698]]
option 4
rvec = 
[[45.0000205 ]
 [12.00005488]
 [65.99999505]]
t = 
[[-100.00010826]
 [-200.00026791]
 [-300.00034698]]
Cere answered 12/4, 2020 at 10:46 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.