How to convert a 3D point into 2D perspective projection?

Asked 7/4, 2009 at 5:23 Answered 29/4, 2017 at 23:56

I am currently working with using Bezier curves and surfaces to draw the famous Utah teapot. Using Bezier patches of 16 control points, I have been able to draw the teapot and display it using a 'world to camera' function which gives the ability to rotate the resulting teapot, and am currently using an orthographic projection.

The result is that I have a 'flat' teapot, which is expected as the purpose of an orthographic projection is to preserve parallel lines.

However, I would like to use a perspective projection to give the teapot depth. My question is, how does one take the 3D xyz vertex returned from the 'world to camera' function, and convert this into a 2D coordinate. I am wanting to use the projection plane at z=0, and allow the user to determine the focal length and image size using the arrow keys on the keyboard.

I am programming this in java and have all of the input event handler set up, and have also written a matrix class which handles basic matrix multiplication. I've been reading through wikipedia and other resources for a while, but I can't quite get a handle on how one performs this transformation.

Chivalry answered 7/4, 2009 at 5:23 Comment(3)

#702004 – Prelature 16/5, 2009 at 2:11

A very good interactive example of , how three D points are projected on two d space, this mathdisk.com/pub/safi/worksheets/Perspective_Projection – Danelaw 8/7, 2012 at 2:16

Did you get any further with this? I've got some questions about the Bezier part of this task here and here. – Jura 28/3, 2013 at 20:5

123

The standard way to represent 2D/3D transformations nowadays is by using homogeneous coordinates. [x,y,w] for 2D, and [x,y,z,w] for 3D. Since you have three axes in 3D as well as translation, that information fits perfectly in a 4x4 transformation matrix. I will use column-major matrix notation in this explanation. All matrices are 4x4 unless noted otherwise.
The stages from 3D points and to a rasterized point, line or polygon looks like this:

Transform your 3D points with the inverse camera matrix, followed with whatever transformations they need. If you have surface normals, transform them as well but with w set to zero, as you don't want to translate normals. The matrix you transform normals with must be isotropic; scaling and shearing makes the normals malformed.
Transform the point with a clip space matrix. This matrix scales x and y with the field-of-view and aspect ratio, scales z by the near and far clipping planes, and plugs the 'old' z into w. After the transformation, you should divide x, y and z by w. This is called the perspective divide.
Now your vertices are in clip space, and you want to perform clipping so you don't render any pixels outside the viewport bounds. Sutherland-Hodgeman clipping is the most widespread clipping algorithm in use.
Transform x and y with respect to w and the half-width and half-height. Your x and y coordinates are now in viewport coordinates. w is discarded, but 1/w and z is usually saved because 1/w is required to do perspective-correct interpolation across the polygon surface, and z is stored in the z-buffer and used for depth testing.

This stage is the actual projection, because z isn't used as a component in the position any more.

The algorithms:

Calculation of field-of-view

This calculates the field-of view. Whether tan takes radians or degrees is irrelevant, but angle must match. Notice that the result reaches infinity as angle nears 180 degrees. This is a singularity, as it is impossible to have a focal point that wide. If you want numerical stability, keep angle less or equal to 179 degrees.

fov = 1.0 / tan(angle/2.0)

Also notice that 1.0 / tan(45) = 1. Someone else here suggested to just divide by z. The result here is clear. You would get a 90 degree FOV and an aspect ratio of 1:1. Using homogeneous coordinates like this has several other advantages as well; we can for example perform clipping against the near and far planes without treating it as a special case.

Calculation of the clip matrix

This is the layout of the clip matrix. aspectRatio is Width/Height. So the FOV for the x component is scaled based on FOV for y. Far and near are coefficients which are the distances for the near and far clipping planes.

[fov * aspectRatio][        0        ][        0              ][        0       ]
[        0        ][       fov       ][        0              ][        0       ]
[        0        ][        0        ][(far+near)/(far-near)  ][        1       ]
[        0        ][        0        ][(2*near*far)/(near-far)][        0       ]

Screen Projection

After clipping, this is the final transformation to get our screen coordinates.

new_x = (x * Width ) / (2.0 * w) + halfWidth;
new_y = (y * Height) / (2.0 * w) + halfHeight;

Trivial example implementation in C++

#include <vector>
#include <cmath>
#include <stdexcept>
#include <algorithm>

struct Vector
{
    Vector() : x(0),y(0),z(0),w(1){}
    Vector(float a, float b, float c) : x(a),y(b),z(c),w(1){}

    /* Assume proper operator overloads here, with vectors and scalars */
    float Length() const
    {
        return std::sqrt(x*x + y*y + z*z);
    }
    
    Vector Unit() const
    {
        const float epsilon = 1e-6;
        float mag = Length();
        if(mag < epsilon){
            std::out_of_range e("");
            throw e;
        }
        return *this / mag;
    }
};

inline float Dot(const Vector& v1, const Vector& v2)
{
    return v1.x*v2.x + v1.y*v2.y + v1.z*v2.z;
}

class Matrix
{
    public:
    Matrix() : data(16)
    {
        Identity();
    }
    void Identity()
    {
        std::fill(data.begin(), data.end(), float(0));
        data[0] = data[5] = data[10] = data[15] = 1.0f;
    }
    float& operator[](size_t index)
    {
        if(index >= 16){
            std::out_of_range e("");
            throw e;
        }
        return data[index];
    }
    Matrix operator*(const Matrix& m) const
    {
        Matrix dst;
        int col;
        for(int y=0; y<4; ++y){
            col = y*4;
            for(int x=0; x<4; ++x){
                for(int i=0; i<4; ++i){
                    dst[x+col] += m[i+col]*data[x+i*4];
                }
            }
        }
        return dst;
    }
    Matrix& operator*=(const Matrix& m)
    {
        *this = (*this) * m;
        return *this;
    }

    /* The interesting stuff */
    void SetupClipMatrix(float fov, float aspectRatio, float near, float far)
    {
        Identity();
        float f = 1.0f / std::tan(fov * 0.5f);
        data[0] = f*aspectRatio;
        data[5] = f;
        data[10] = (far+near) / (far-near);
        data[11] = 1.0f; /* this 'plugs' the old z into w */
        data[14] = (2.0f*near*far) / (near-far);
        data[15] = 0.0f;
    }

    std::vector<float> data;
};

inline Vector operator*(const Vector& v, const Matrix& m)
{
    Vector dst;
    dst.x = v.x*m[0] + v.y*m[4] + v.z*m[8 ] + v.w*m[12];
    dst.y = v.x*m[1] + v.y*m[5] + v.z*m[9 ] + v.w*m[13];
    dst.z = v.x*m[2] + v.y*m[6] + v.z*m[10] + v.w*m[14];
    dst.w = v.x*m[3] + v.y*m[7] + v.z*m[11] + v.w*m[15];
    return dst;
}

typedef std::vector<Vector> VecArr;
VecArr ProjectAndClip(int width, int height, float near, float far, const VecArr& vertex)
{
    float halfWidth = (float)width * 0.5f;
    float halfHeight = (float)height * 0.5f;
    float aspect = (float)width / (float)height;
    Vector v;
    Matrix clipMatrix;
    VecArr dst;
    clipMatrix.SetupClipMatrix(60.0f * (M_PI / 180.0f), aspect, near, far);
    /*  Here, after the perspective divide, you perform Sutherland-Hodgeman clipping 
        by checking if the x, y and z components are inside the range of [-w, w].
        One checks each vector component seperately against each plane. Per-vertex
        data like colours, normals and texture coordinates need to be linearly
        interpolated for clipped edges to reflect the change. If the edge (v0,v1)
        is tested against the positive x plane, and v1 is outside, the interpolant
        becomes: (v1.x - w) / (v1.x - v0.x)
        I skip this stage all together to be brief.
    */
    for(VecArr::iterator i=vertex.begin(); i!=vertex.end(); ++i){
        v = (*i) * clipMatrix;
        v /= v.w; /* Don't get confused here. I assume the divide leaves v.w alone.*/
        dst.push_back(v);
    }

    /* TODO: Clipping here */

    for(VecArr::iterator i=dst.begin(); i!=dst.end(); ++i){
        i->x = (i->x * (float)width) / (2.0f * i->w) + halfWidth;
        i->y = (i->y * (float)height) / (2.0f * i->w) + halfHeight;
    }
    return dst;
}

If you still ponder about this, the OpenGL specification is a really nice reference for the maths involved. The DevMaster forums at http://www.devmaster.net/ have a lot of nice articles related to software rasterizers as well.

Ailsa answered 15/5, 2009 at 2:26 Comment(2)

What does this mean? "you perform Sutherland-Hodgeman clipping by checking if the x, y and z components are inside the range of [-w, w]." Namely the -w and w. Where do those values come from? – Celik 29/8, 2012 at 23:22

That trivial example does not even compile. – Eugine 31/3, 2016 at 15:57

I think this will probably answer your question. Here's what I wrote there

Say the camera's at (Xc, Yc, Zc) and the point you want to project is P = (X, Y, Z). The distance from the camera to the 2D plane onto which you are projecting is F (so the equation of the plane is Z-Zc=F). The 2D coordinates of P projected onto the plane are (X', Y').

Then, very simply:

X' = ((X - Xc) * (F/Z)) + Xc
Y' = ((Y - Yc) * (F/Z)) + Yc

If your camera is the origin, then this simplifies to:

X' = X * (F/Z)
Y' = Y * (F/Z)

Roanne answered 7/4, 2009 at 5:33 Comment(0)

To obtain the perspective-corrected co-ordinates, just divide by the z co-ordinate:

xc = x / z
yc = y / z

The above works assuming that the camera is at (0, 0, 0) and you are projecting onto the plane at z = 1 -- you need to translate the co-ords relative to the camera otherwise.

There are some complications for curves, insofar as projecting the points of a 3D Bezier curve will not in general give you the same points as drawing a 2D Bezier curve through the projected points.

Emalee answered 7/4, 2009 at 5:33 Comment(0)

You can project 3D point in 2D using: Commons Math: The Apache Commons Mathematics Library with just two classes.

Example for Java Swing.

import org.apache.commons.math3.geometry.euclidean.threed.Plane;
import org.apache.commons.math3.geometry.euclidean.threed.Vector3D;


Plane planeX = new Plane(new Vector3D(1, 0, 0));
Plane planeY = new Plane(new Vector3D(0, 1, 0)); // Must be orthogonal plane of planeX

void drawPoint(Graphics2D g2, Vector3D v) {
    g2.drawLine(0, 0,
            (int) (world.unit * planeX.getOffset(v)),
            (int) (world.unit * planeY.getOffset(v)));
}

protected void paintComponent(Graphics g) {
    super.paintComponent(g);

    drawPoint(g2, new Vector3D(2, 1, 0));
    drawPoint(g2, new Vector3D(0, 2, 0));
    drawPoint(g2, new Vector3D(0, 0, 2));
    drawPoint(g2, new Vector3D(1, 1, 1));
}

Now you only needs update the planeX and planeY to change the perspective-projection, to get things like this:

enter image description here

Scull answered 19/4, 2013 at 16:26 Comment(0)

Looking at the screen from the top, you get x and z axis.
Looking at the screen from the side, you get y and z axis.

Calculate the focal lengths of the top and side views, using trigonometry, which is the distance between the eye and the middle of the screen, which is determined by the field of view of the screen. This makes the shape of two right triangles back to back.

hw = screen_width / 2

hh = screen_height / 2

fl_top = hw / tan(θ/2)

fl_side = hh / tan(θ/2)

Then take the average focal length.

fl_average = (fl_top + fl_side) / 2

Now calculate the new x and new y with basic arithmetic, since the larger right triangle made from the 3d point and the eye point is congruent with the smaller triangle made by the 2d point and the eye point.

x' = (x * fl_top) / (z + fl_top)

y' = (y * fl_top) / (z + fl_top)

Or you can simply set

x' = x / (z + 1)

and

y' = y / (z + 1)

Alluvion answered 17/8, 2016 at 20:17 Comment(0)

All of the answers address the question posed in the title. However, I would like to add a caveat that is implicit in the text. Bézier patches are used to represent the surface, but you cannot just transform the points of the patch and tessellate the patch into polygons, because this will result in distorted geometry. You can, however, tessellate the patch first into polygons using a transformed screen tolerance and then transform the polygons, or you can convert the Bézier patches to rational Bézier patches, then tessellate those using a screen-space tolerance. The former is easier, but the latter is better for a production system.

I suspect that you want the easier way. For this, you would scale the screen tolerance by the norm of the Jacobian of the inverse perspective transformation and use that to determine the amount of tessellation that you need in model space (it might be easier to compute the forward Jacobian, invert that, then take the norm). Note that this norm is position-dependent, and you may want to evaluate this at several locations, depending on the perspective. Also remember that since the projective transformation is rational, you need to apply the quotient rule to compute the derivatives.

Promiscuity answered 8/5, 2016 at 2:30 Comment(0)

Thanks to @Mads Elvenheim for a proper example code. I have fixed the minor syntax errors in the code (just a few const problems and obvious missing operators). Also, near and far have vastly different meanings in vs.

For your pleasure, here is the compileable (MSVC2013) version. Have fun. Mind that I have made NEAR_Z and FAR_Z constant. You probably dont want it like that.

#include <vector>
#include <cmath>
#include <stdexcept>
#include <algorithm>

#define M_PI 3.14159

#define NEAR_Z 0.5
#define FAR_Z 2.5

struct Vector
{
    float x;
    float y;
    float z;
    float w;

    Vector() : x( 0 ), y( 0 ), z( 0 ), w( 1 ) {}
    Vector( float a, float b, float c ) : x( a ), y( b ), z( c ), w( 1 ) {}

    /* Assume proper operator overloads here, with vectors and scalars */
    float Length() const
    {
        return std::sqrt( x*x + y*y + z*z );
    }
    Vector& operator*=(float fac) noexcept
    {
        x *= fac;
        y *= fac;
        z *= fac;
        return *this;
    }
    Vector  operator*(float fac) const noexcept
    {
        return Vector(*this)*=fac;
    }
    Vector& operator/=(float div) noexcept
    {
        return operator*=(1/div);   // avoid divisions: they are much
                                    // more costly than multiplications
    }

    Vector Unit() const
    {
        const float epsilon = 1e-6;
        float mag = Length();
        if (mag < epsilon) {
            std::out_of_range e( "" );
            throw e;
        }
        return Vector(*this)/=mag;
    }
};

inline float Dot( const Vector& v1, const Vector& v2 )
{
    return v1.x*v2.x + v1.y*v2.y + v1.z*v2.z;
}

class Matrix
{
public:
    Matrix() : data( 16 )
    {
        Identity();
    }
    void Identity()
    {
        std::fill( data.begin(), data.end(), float( 0 ) );
        data[0] = data[5] = data[10] = data[15] = 1.0f;
    }
    float& operator[]( size_t index )
    {
        if (index >= 16) {
            std::out_of_range e( "" );
            throw e;
        }
        return data[index];
    }
    const float& operator[]( size_t index ) const
    {
        if (index >= 16) {
            std::out_of_range e( "" );
            throw e;
        }
        return data[index];
    }
    Matrix operator*( const Matrix& m ) const
    {
        Matrix dst;
        int col;
        for (int y = 0; y<4; ++y) {
            col = y * 4;
            for (int x = 0; x<4; ++x) {
                for (int i = 0; i<4; ++i) {
                    dst[x + col] += m[i + col] * data[x + i * 4];
                }
            }
        }
        return dst;
    }
    Matrix& operator*=( const Matrix& m )
    {
        *this = (*this) * m;
        return *this;
    }

    /* The interesting stuff */
    void SetupClipMatrix( float fov, float aspectRatio )
    {
        Identity();
        float f = 1.0f / std::tan( fov * 0.5f );
        data[0] = f*aspectRatio;
        data[5] = f;
        data[10] = (FAR_Z + NEAR_Z) / (FAR_Z- NEAR_Z);
        data[11] = 1.0f; /* this 'plugs' the old z into w */
        data[14] = (2.0f*NEAR_Z*FAR_Z) / (NEAR_Z - FAR_Z);
        data[15] = 0.0f;
    }

    std::vector<float> data;
};


inline Vector operator*( const Vector& v, Matrix& m )
{
    Vector dst;
    dst.x = v.x*m[0] + v.y*m[4] + v.z*m[8] + v.w*m[12];
    dst.y = v.x*m[1] + v.y*m[5] + v.z*m[9] + v.w*m[13];
    dst.z = v.x*m[2] + v.y*m[6] + v.z*m[10] + v.w*m[14];
    dst.w = v.x*m[3] + v.y*m[7] + v.z*m[11] + v.w*m[15];
    return dst;
}

typedef std::vector<Vector> VecArr;
VecArr ProjectAndClip( int width, int height, const VecArr& vertex )
{
    float halfWidth = (float)width * 0.5f;
    float halfHeight = (float)height * 0.5f;
    float aspect = (float)width / (float)height;
    Vector v;
    Matrix clipMatrix;
    VecArr dst;
    clipMatrix.SetupClipMatrix( 60.0f * (M_PI / 180.0f), aspect);
    /*  Here, after the perspective divide, you perform Sutherland-Hodgeman clipping
    by checking if the x, y and z components are inside the range of [-w, w].
    One checks each vector component seperately against each plane. Per-vertex
    data like colours, normals and texture coordinates need to be linearly
    interpolated for clipped edges to reflect the change. If the edge (v0,v1)
    is tested against the positive x plane, and v1 is outside, the interpolant
    becomes: (v1.x - w) / (v1.x - v0.x)
    I skip this stage all together to be brief.
    */
    for (VecArr::const_iterator i = vertex.begin(); i != vertex.end(); ++i) {
        v = (*i) * clipMatrix;
        v /= v.w; /* Don't get confused here. I assume the divide leaves v.w alone.*/
        dst.push_back( v );
    }

    /* TODO: Clipping here */

    for (VecArr::iterator i = dst.begin(); i != dst.end(); ++i) {
        i->x = (i->x * (float)width) / (2.0f * i->w) + halfWidth;
        i->y = (i->y * (float)height) / (2.0f * i->w) + halfHeight;
    }
    return dst;
}
#pragma once

Bilabial answered 29/4, 2017 at 23:56 Comment(7)

this is horrible code. For example your operator/= wastes 2 divisions and returns a reference to a temporary. Instead it should change this and return *this. Moreover, it's C++, but the question is java. – Beware 21/6, 2017 at 15:13

@Beware I've fixed the dangerous reference-to-temporary, thanks for notifying me of this. As I have stated in the answer, this is just a compileable version of Mads Elvenheims answer; I have not stated that this is very efficient, but It had helped me to grasp the details, and thus I wanted to share it. Regarding Java vs C++, the accepted answer I copied here is also written in C++. – Bilabial 21/6, 2017 at 15:24

Can't see any fix to Vector::operator=/. – Beware 21/6, 2017 at 15:27

Changed the operator/= as per your suggestion. You're right in that it should change the vector itself. – Bilabial 21/6, 2017 at 15:38

@Beware Which divisions would you deem wasteful? I have been pondering a good amount of time now, but fail to recognize anything unneccesary there. – Bilabial 21/6, 2017 at 15:58

I have edited your code. I hope you don't mind. Also, why do you throw an error in Unit() if the length is below 1e-6? What if all 3 vector components are smaller? Either you don't do anything (and return NaNs if length=0), or you only check for length==0. – Beware 21/6, 2017 at 17:2

@Beware Thanks for your input. I just fixed the obvious compiler errors in the accepted answer; Design questions like these were beyond the scope of my fixes. – Bilabial 22/6, 2017 at 8:19

I'm not sure at what level you're asking this question. It sounds as if you've found the formulas online, and are just trying to understand what it does. On that reading of your question I offer:

Imagine a ray from the viewer (at point V) directly towards the center of the projection plane (call it C).
Imagine a second ray from the viewer to a point in the image (P) which also intersects the projection plane at some point (Q)
The viewer and the two points of intersection on the view plane form a triangle (VCQ); the sides are the two rays and the line between the points in the plane.
The formulas are using this triangle to find the coordinates of Q, which is where the projected pixel will go

Rune answered 7/4, 2009 at 5:34 Comment(2)

Actually quite the opposite - I have found several diagrams and explanations as to what a projection plane is, but I haven't started coding anything because I don't have a formula to work with. I'm not even sure what the "forumlas" are that you're referring to. – Chivalry 7/4, 2009 at 5:45

@Zachary -- The formulas contained in the other two answers, the Wikipedia page, etc. Basically X' = (X-C1)*C2/Z + C3 and likewise for Y. – Rune 7/4, 2009 at 6:16

I know it's an old topic but your illustration is not correct, the source code sets up the clip matrix correct.

[fov * aspectRatio][        0        ][        0              ][        0       ]
[        0        ][       fov       ][        0              ][        0       ]
[        0        ][        0        ][(far+near)/(far-near)  ][(2*near*far)/(near-far)]
[        0        ][        0        ][        1              ][        0       ]

some addition to your things:

This clip matrix works only if you are projecting on static 2D plane if you want to add camera movement and rotation:

viewMatrix = clipMatrix * cameraTranslationMatrix4x4 * cameraRotationMatrix4x4;

this lets you rotate the 2D plane and move it around..-

Radioluminescence answered 22/7, 2010 at 8:29 Comment(1)

No, you are wrong. It depends on whether you are using column or row-major notation. You're thinking of OpenGL which uses column-major. – Denmark 11/8, 2013 at 4:39

You might want to debug your system with spheres to determine whether or not you have a good field of view. If you have it too wide, the spheres with deform at the edges of the screen into more oval forms pointed toward the center of the frame. The solution to this problem is to zoom in on the frame, by multiplying the x and y coordinates for the 3 dimensional point by a scalar and then shrinking your object or world down by a similar factor. Then you get the nice even round sphere across the entire frame.

I'm almost embarrassed that it took me all day to figure this one out and I was almost convinced that there was some spooky mysterious geometric phenomenon going on here that demanded a different approach.

Yet, the importance of calibrating the zoom-frame-of-view coefficient by rendering spheres cannot be overstated. If you do not know where the "habitable zone" of your universe is, you will end up walking on the sun and scrapping the project. You want to be able to render a sphere anywhere in your frame of view an have it appear round. In my project, the unit sphere is massive compared to the region that I'm describing.

Also, the obligatory wikipedia entry: Spherical Coordinate System

Passable answered 28/1, 2014 at 21:12 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++