Rotating image from orientation sensor data

I want to calculate depth information from 2 images from a moving camera. Using Sensor.TYPE_ROTATION_VECTOR I have quaternions for both images as well as the relative quaternoin from Img1 to Img2.

Img1

enter image description here

q1 = 0.7545 - 0.1137i - 0.2715j - 0.5865k

Img2

enter image description here

q2 = 0.7706 - 0.2252i - 0.3511j - 0.4817k

And relative quaternoin is :

qr = -0.9850 + 0.0072i + 0.1329j - 0.1097k

That is, relative rotational matrix is,

|0.9406   -0.2142    -0.2635 |
|0.2180    0.9758    -0.0150 |
|0.2604   -0.0433     0.9645 |

Is this the matrix getPerspectiveTransform gives?

When I use this 3x3 rotational matrix in warpPerspective, I get an almost blank image, apart from something on the left hand upper corner. (Maybe the axis around which the image is rotated is wrong).

enter image description here

What am I doing wrong?

Note : There's a small translation as well between the 2 images (Sorry about the bad images)

Edit 1 : According to this link, for my Moto G 2nd Generation, I get an intrinsic camera matrix as,

K = |-3570   0         1632 |
    |  0   3554.39   1218.65|
    |  0     0           1  |

If I understand correctly, you have two images taken from a smartphone camera, for which you know (at least approximately) the intrinsics matrix, and the relative 3D rotation between the poses where the two images where taken. You are also saying that there is a small translation between the two images, which is good since you would not have been able to calculate depth otherwise.

Unfortunately, you do not have enough information to be able to directly estimate depth. Basically, estimating depth from two images requires to:

1. Find point correspondences between the two images

Depending what you want to do, this can be done either for all points in the images (i.e. in a dense way) or only for a few points (i.e. in a sparse way). Of course the latter is less computationally expensive, hence more appropriate for smartphones.

Dense matching requires to rectify the images, in order to make the computation tractable, however this will probably take a long time if performed on a smartphone. Image rectification can be achieved either using a calibrated method (which requires to know the rotation+translation between the two poses of the images, the intrinsics camera matrix and the distortion coefficients of the camera) or a non-calibrated method (which requires to know sparse point matches between the two images and the fundamental matrix, which can be estimated from the matches).
Sparse matching requires to match salient features (e.g. SURFs or SIFTs, or more efficient ones) between the two images. This has the advantage of being more efficient than dense matching and also more accurate.

2. Triangulate the corresponding points to estimate depth

Triangulation requires to know the intrinsics parameters (camera matrix and distortion coefficients) and the extrinsics parameters (relative rotation and translation between the poses form which the images where taken).

In your case, assuming your relative rotation and intrinsics camera matrix are accurate enough (which I doubt), you still lack the translation and the distortion coefficients.

However, you can still apply the classical approach for stereo triangulation, which require an accurate calibration of your camera and an estimation of the full relative pose (i.e. rotation + translation).

The calibration of your camera will enable you to estimate an accurate intrinsics matrix and the associated distortion coefficients. Doing this is recommanded because your camera will not be exactly the same as the cameras in other phones (even if it is the same phone model). See e.g. this tutorial, which shows the methodology eventhough the code samples are in C++ (the equivalent must exist for android).

Once you have estimated accurately the intrinsics parameters, one way to estimate the full relative pose (i.e. rotation and translation) is to compute the fundamental matrix (using feature matches found between the two images), then to infer the essential matrix using the camera matrix, and finally to decompose the essential matrix into the relative rotation and translation. See this link, which gives the formula to infer the essential matrix from the fundamental matrix, and this link, which explain how to compute the rotation and translation from the essential matrix.

Also, to answer your other question related to warpPerspective, you would need to use K.R.inv(K) or K.inv(R).inv(K), depending on the image you are warping. This is because R is a 3D rotation, which has quite nothing to do with pixel coordinates.

Recommended topics

Hot tags