Note: There is a version of this with proper math typesetting on the Math SE.
Computing a projective transformation
A perspective is a special case of a projective transformation, which in turn is defined by four points.
Step 1: Starting with the 4 positions in the source image, named (x1,y1)
through (x4,y4)
, you solve the following system of linear equations:
[x1 x2 x3] [λ] [x4]
[y1 y2 y3]∙[μ] = [y4]
[ 1 1 1] [τ] [ 1]
The colums form homogenous coordinates: one dimension more, created by adding a 1
as the last entry. In subsequent steps, multiples of these vectors will be used to denote the same points. See the last step for an example of how to turn these back into two-dimensional coordinates.
Step 2: Scale the columns by the coefficients you just computed:
[λ∙x1 μ∙x2 τ∙x3]
A = [λ∙y1 μ∙y2 τ∙y3]
[λ μ τ ]
This matrix will map (1,0,0)
to a multiple of (x1,y1,1)
, (0,1,0)
to a multiple of (x2,y2,1)
, (0,0,1)
to a multiple of (x3,y3,1)
and (1,1,1)
to (x4,y4,1)
. So it will map these four special vectors (called basis vectors in subsequent explanations) to the specified positions in the image.
Step 3: Repeat steps 1 and 2 for the corresponding positions in the destination image, in order to obtain a second matrix called B
.
This is a map from basis vectors to destination positions.
Step 4: Invert B
to obtain B⁻¹
.
B
maps from basis vectors to the destination positions, so the inverse matrix maps in the reverse direction.
Step 5: Compute the combined Matrix C = A∙B⁻¹
.
B⁻¹
maps from destination positions to basis vectors, while A
maps from there to source positions. So the combination maps destination positions to source positions.
Step 6: For every pixel (x,y)
of the destination image, compute the product
[x'] [x]
[y'] = C∙[y]
[z'] [1]
These are the homogenous coordinates of your transformed point.
Step 7: Compute the position in the source image like this:
sx = x'/z'
sy = y'/z'
This is called dehomogenization of the coordinate vector.
All this math would be so much easier to read and write if SO were to support MathJax… ☹
Choosing the image size
The above aproach assumes that you know the location of your corners in the destination image. For these you have to know the width and height of that image, which is marked by question marks in your code as well. So let's assume the height
of your output image were 1
, and the width
were sourceaspect
. In that case, the overall area would be sourceaspect
as well. You have to scale that area by a factor of pixelcount/sourceaspect
to achieve an area of pixelcount
. Which means that you have to scale each edge length by the square root of that factor. So in the end, you have
pixelcount = 1000000.*megapixelcount;
width = round(sqrt(pixelcount*sourceaspect));
height = round(sqrt(pixelcount/sourceaspect));