It turns out I was wrong in some points, so I have decided to rewrite this answer.
In brief - you are getting strange results because of incorrect intrinsic parameter matrix.
Using the terminology from the paper "Malis, E and Vargas, M, "Deeper understanding of the homography decomposition for vision-based control" (on which the homography decomposition in OpenCV is based), the perspective transformation is denoted by H, and called a Euclidean homography matrix, and result of its normalization G = K^-1 * H * K (where K is camera's calibration matrix) is called a homography matrix
Both cv::findHomography()
and cv::decomposeHomographyMat()
work with Euclidean homography matrix. But in order to decompose it into translation and rotation, cv::decomposeHomographyMat()
normalizes Euclidean homography matrix to obtain homography matrix. It relies on K, provided from user, in order to perform this normalization.
As for the estimation of K, I think this is beyond the scope of this question. This problem is called Camera auto-calibration, here's a relevant quote from this wiki article:
Therefore, three views are the minimum needed for full calibration with fixed intrinsic parameters between views. Quality modern imaging sensors and optics may also provide further prior constraints on the calibration such as zero skew (orthogonal pixel grid) and unity aspect ratio (square pixels). Integrating these priors will reduce the minimal number of images needed to two.
It seems, that you can extract the K from image correspondances over 2 frames from the same camera under the zero skew and square pixel assumptions. But, I am not familiar with this subject, so can't give you more suggestions.
So, to check if my interpretation is correct, I have made a small example, that projects some points on a plane in 3D on 2 virtual cameras, finds a homography, decomposes it, and allows comparing this decomposition to ground truth rotation and translation vectors. This is better than real inputs, because this way we know the K precisely, and can decouple error in its estimation from error in R and t. For inputs I have checked it was able to correctly estimate rotation and translation vectors, although for some reason translation is always smaller than ground truth 10 times. Maybe this decomposition is defined only up to a scale (I'm not sure now), but it is interesting, that it is related to the ground truth value with a fixed coefficient.
Here's the source:
#include <opencv2/opencv.hpp>
#include <iostream>
#include <vector>
int main() {
// set up a virtual camera
float f = 100, w = 640, h = 480;
cv::Mat1f K = (cv::Mat1f(3, 3) <<
f, 0, w/2,
0, f, h/2,
0, 0, 1);
// set transformation from 1st to 2nd camera (assume K is unchanged)
cv::Mat1f rvecDeg = (cv::Mat1f(3, 1) << 45, 12, 66);
cv::Mat1f t = (cv::Mat1f(3, 1) << 100, 200, 300);
std::cout << "-------------------------------------------\n";
std::cout << "Ground truth:\n";
std::cout << "K = \n" << K << std::endl << std::endl;
std::cout << "rvec = \n" << rvecDeg << std::endl << std::endl;
std::cout << "t = \n" << t << std::endl << std::endl;
// set up points on a plane
std::vector<cv::Point3f> p3d{{0, 0, 10},
{100, 0, 10},
{0, 100, 10},
{100, 100, 10}};
// project on both cameras
std::vector<cv::Point2f> Q, P, S;
cv::projectPoints(p3d,
cv::Mat1d::zeros(3, 1),
cv::Mat1d::zeros(3, 1),
K,
cv::Mat(),
Q);
cv::projectPoints(p3d,
rvecDeg*CV_PI/180,
t,
K,
cv::Mat(),
P);
// find homography
cv::Mat H = cv::findHomography(Q, P);
std::cout << "-------------------------------------------\n";
std::cout << "Estimated H = \n" << H << std::endl << std::endl;
// check by reprojection
std::vector<cv::Point2f> P_(P.size());
cv::perspectiveTransform(Q, P_, H);
float sumError = 0;
for (size_t i = 0; i < P.size(); i++) {
sumError += cv::norm(P[i] - P_[i]);
}
std::cout << "-------------------------------------------\n";
std::cout << "Average reprojection error = "
<< sumError/P.size() << std::endl << std::endl;
// decompose using identity as internal parameters matrix
std::vector<cv::Mat> Rs, Ts;
cv::decomposeHomographyMat(H,
K,
Rs, Ts,
cv::noArray());
std::cout << "-------------------------------------------\n";
std::cout << "Estimated decomposition:\n\n";
std::cout << "rvec = " << std::endl;
for (auto R_ : Rs) {
cv::Mat1d rvec;
cv::Rodrigues(R_, rvec);
std::cout << rvec*180/CV_PI << std::endl << std::endl;
}
std::cout << std::endl;
std::cout << "t = " << std::endl;
for (auto t_ : Ts) {
std::cout << t_ << std::endl << std::endl;
}
return 0;
}
And here's the output on my machine:
-------------------------------------------
Ground truth:
K =
[100, 0, 320;
0, 100, 240;
0, 0, 1]
rvec =
[45;
12;
66]
t =
[100;
200;
300]
-------------------------------------------
Estimated H =
[0.04136041220427821, 0.04748763375951008, 358.5557917287962;
0.05074854454707714, 0.06137211243830468, 297.4585754092336;
8.294458769850147e-05, 0.0002294875562580223, 1]
-------------------------------------------
Average reprojection error = 0
-------------------------------------------
Estimated decomposition:
rvec =
[-73.21470385654712;
56.64668212487194;
82.09114210289061]
[-73.21470385654712;
56.64668212487194;
82.09114210289061]
[45.00005330430893;
12.00000697952995;
65.99998380038915]
[45.00005330430893;
12.00000697952995;
65.99998380038915]
t =
[10.76993852870029;
18.60689642878277;
30.62344129378435]
[-10.76993852870029;
-18.60689642878277;
-30.62344129378435]
[10.00001378255982;
20.00002581449634;
30.0000336510648]
[-10.00001378255982;
-20.00002581449634;
-30.0000336510648]
As you can see, there is correct estimation of rotation vector among the hypothesis, and there's a up-to-scale correct estimation of translation.