Skewed frustum/off-axis projection for head tracking in OpenGL

Asked 23/5, 2013 at 20:51 Answered 3/7, 2013 at 14:54

I am trying to do an off-axis projection in my application and trying to change the perspective of the scene as per the user's head position. Normally, given that I had to draw a box on the screen, I would draw a Box on the screen as:

ofBox(350,250,0,50); //ofBox(x, y, z, size); where x, y and z used here are the screen coordinates

To do an off-axis projection here, I am aware that I would have to change the perspective projection as follows:

vertFov = 0.5; near = 0.5; aspRatio = 1.33;
glMatrixMode(GL_PROJECTION);
glLoadIdentity();
glFrustum(near * (-vertFov * aspRatio + headX),
          near * (vertFov * aspRatio + headX),
          near * (-vertFov + headY),
          near * (vertFov + headY),
          near, far); //frustum changes as per the position of headX and headY
glMatrixMode(GL_MODELVIEW);
glLoadIdentity();
gluLookAt(headX * headZ, headY * headZ, 0, headX * headZ, headY * headZ, -1);
glTranslate(0,0,headZ);

For a symmetric frustum in the above case (where headX and headY is zero), the left, right params come out to be -0.33, 0.33 and bottom, top parameters come out to be -0.25, 0.25 and establish my clipping volume along those coordinates. I tried to simulate the off-axis using a mouse for a test and did the following:

double mouseXPosition = (double)ofGetMouseX();
double mouseYPosition = (double)ofGetMouseY();
double scrWidth = (double)ofGetWidth();
double scrHeight = (double)ofGetHeight();

headX = ((scrWidth -mouseXPosition) / scrWidth) - 0.5;
headY = (mouseYPosition / scrHeight) - 0.5;
headZ = -0.5; //taken z constant for this mouse test

However, I intend to use Kinect which gives me coordinates for head of the order of (200, 400, 1000), (-250, 600, 1400), (400, 100, 1400) etc. and I am not able to make out how to change the frustum parameters when I have those head positions. For eg: Considering 0 to be at the center for the Kinect, if the user moves such that his position is (200, 400, 1000), then how would the frustum parameters change here?
How will the objects have to be drawn when the z-distance obtained from Kinect will also have to be taken into account? Objects have to become smaller in size as z increase and that could happen by glTrasnlate() call inside the above off-axis code, but the two scales of the coordinate systems are different (glFrustum now sets clipping volume to [-0.25,0.33] to [0.25,-0.33] wheres Kinect is in the order of hundreds (400,200,1000)). How do I apply the z values to glFrustum/gluLookAt then?

Drus answered 23/5, 2013 at 20:51 Comment(2)

An interesting project. I wonder what the chances are of the end result being kids sitting an inch from the TV to get the maximum view of the world? ;) – Joanjoana 30/5, 2013 at 7:58

Did you get it to work with the Kinect? A video from the user's perspective would be really cool. – Centrum 31/5, 2013 at 14:56

First, you don't want to use gluLookAt. gluLookAt rotates the camera, but the physical screen the user looks at doesn't rotate. gluLookAt would only work if the screen would rotate such that the screen normal would keep pointing at the user. The perspective distortion of the off-axis projection will take care of all the rotation we need.

What you need to factor into your model is the position of the screen within the frustum. Consider the following image. The red points are the screen borders. What you need to achieve is that these positions remain constant in the 3D WCS, since the physical screen in the real world also (hopefully) doesn't move. I think this is the key insight to virtual reality and stereoscopy. The screen is something like a window into the virtual reality, and to align the real world with the virtual reality, you need to align the frustum with that window.

Awesome MSPaint skills

To do that you have to determine the position of the screen in the coordinate system of the Kinect. Assuming the Kinect is on top of the screen, that +y points downwards, and that the unit you're using is millimeters, I would expect these coordinates to be something along the lines of (+-300, 200, 0), (+-300, 500, 0).

Now there are two possibilities for the far plane. You could either choose to use a fixed distance from the camera to the far plane. That would mean the far plane would move backwards if the user moved backwards, possibly clipping objects you'd like to draw. Or you could keep the far plane at a fixed position in the WCS, as shown in the image. I believe the latter is more useful. For the near plane, I think a fixed distance from the camera is ok though.

The inputs are the 3D positions of the screen wcsPtTopLeftScreen and wcsPtBottomRightScreen, the tracked position of the head wcsPtHead, the z value of the far plane wcsZFar (all in the WCS), and the z value of the near plane camZNear (in camera coordinates). We need to compute the frustum parameters in camera coordinates.

camPtTopLeftScreen = wcsPtTopLeftScreen - wcsPtHead;
camPtTopLeftNear = camPtTopLeftScreen / camPtTopLeftScreen.z * camZNear;

and the same with the bottom right point. Also:

camZFar = wcsZFar - wcsPtHead.z

enter image description here

Now the only problem is that the Kinect and OpenGL use different coordinate systems. In the Kinect CS, +y points down, +z points from the user towards the Kinect. In OpenGL, +y points up, +z points towards the viewer. That means we have to multiply y and z by -1:

glFrustum(camPtTopLeftNear.x, camPtBottomRightNear.x,
  -camPtBottomRightNear.y, -camPtTopLeftNear.y, camZNear, camZFar);

If you want a better explanation that also covers stereoscopy, check out this video, I found it insightful and well done.

Quick demo, you might have to adjust wcsWidth, pxWidth, and wcsPtHead.z.

#include <glm/glm.hpp>
#include <glm/ext.hpp>
#include <glut.h>
#include <functional>

float heightFromWidth;
glm::vec3 camPtTopLeftNear, camPtBottomRightNear;
float camZNear, camZFar;
glm::vec3 wcsPtHead(0, 0, -700);

void moveCameraXY(int pxPosX, int pxPosY)
{
  // Width of the screen in mm and in pixels.
  float wcsWidth = 520.0;
  float pxWidth = 1920.0f;

  float wcsHeight = heightFromWidth * wcsWidth;
  float pxHeight = heightFromWidth * pxWidth;
  float wcsFromPx = wcsWidth / pxWidth;

  glm::vec3 wcsPtTopLeftScreen(-wcsWidth/2.f, -wcsHeight/2.f, 0);
  glm::vec3 wcsPtBottomRightScreen(wcsWidth/2.f, wcsHeight/2.f, 0);
  wcsPtHead = glm::vec3(wcsFromPx * float(pxPosX - pxWidth / 2), wcsFromPx * float(pxPosY - pxHeight * 0.5f), wcsPtHead.z);
  camZNear = 1.0;
  float wcsZFar = 500;

  glm::vec3 camPtTopLeftScreen = wcsPtTopLeftScreen - wcsPtHead;
  camPtTopLeftNear = camZNear / camPtTopLeftScreen.z * camPtTopLeftScreen;
  glm::vec3 camPtBottomRightScreen = wcsPtBottomRightScreen - wcsPtHead;
  camPtBottomRightNear = camPtBottomRightScreen / camPtBottomRightScreen.z * camZNear;
  camZFar = wcsZFar - wcsPtHead.z;

  glutPostRedisplay();
}

void moveCameraZ(int button, int state, int x, int y)
{
  // No mouse wheel in GLUT. :(
  if ((button == 0) || (button == 2))
  {
    if (state == GLUT_DOWN)
      return;
    wcsPtHead.z += (button == 0 ? -1 : 1) * 100;
    glutPostRedisplay();
  }
}

void reshape(int w, int h)
{
  heightFromWidth = float(h) / float(w);
  glViewport(0, 0, w, h);
}

void drawObject(std::function<void(GLdouble)> drawSolid, std::function<void(GLdouble)> drawWireframe, GLdouble size)
{
  glPushAttrib(GL_ALL_ATTRIB_BITS);
  glEnable(GL_COLOR);
  glDisable(GL_LIGHTING);
  glColor4f(1, 1, 1, 1);
  drawSolid(size);
  glColor4f(0.8, 0.8, 0.8, 1);
  glDisable(GL_DEPTH_TEST);
  glLineWidth(1);
  drawWireframe(size);

  glColor4f(0, 0, 0, 1);
  glEnable(GL_DEPTH_TEST);
  glLineWidth(3);
  drawWireframe(size);
  glPopAttrib();
}

void display(void)
{
  glPushAttrib(GL_ALL_ATTRIB_BITS);
  glClear(GL_COLOR_BUFFER_BIT|GL_DEPTH_BUFFER_BIT);
  glEnable(GL_DEPTH_TEST);

  // In the Kinect CS, +y points down, +z points from the user towards the Kinect.
  // In OpenGL, +y points up, +z points towards the viewer.
  glm::mat4 mvpCube;
  mvpCube = glm::frustum(camPtTopLeftNear.x, camPtBottomRightNear.x,
    -camPtBottomRightNear.y, -camPtTopLeftNear.y, camZNear, camZFar);
  mvpCube = glm::scale(mvpCube, glm::vec3(1, -1, -1));
  mvpCube = glm::translate(mvpCube, -wcsPtHead);
  glMatrixMode(GL_MODELVIEW); glLoadMatrixf(glm::value_ptr(mvpCube));

  drawObject(glutSolidCube, glutWireCube, 140);

  glm::mat4 mvpTeapot = glm::translate(mvpCube, glm::vec3(100, 0, 200));
  mvpTeapot = glm::scale(mvpTeapot, glm::vec3(1, -1, -1)); // teapots are in OpenGL coordinates
  glLoadMatrixf(glm::value_ptr(mvpTeapot));
  glColor4f(1, 1, 1, 1);
  drawObject(glutSolidTeapot, glutWireTeapot, 50);

  glFlush();
  glPopAttrib();
}

void leave(unsigned char, int, int)
{
  exit(0);
}

int main(int argc, char **argv)
{
  glutInit(&argc, argv);
  glutCreateWindow("glut test");
  glutDisplayFunc(display);
  glutReshapeFunc(reshape);
  moveCameraXY(0,0);
  glutPassiveMotionFunc(moveCameraXY);
  glutMouseFunc(moveCameraZ);
  glutKeyboardFunc(leave);
  glutFullScreen();
  glutMainLoop();
  return 0;
}

The following images should be viewed from a distance equal to 135% of their width on screen (70 cm on my 52 cm wide screen in fullscreen). enter image description here

Centrum answered 26/5, 2013 at 0:54 Comment(17)

That just changes the whole concept that I had on my mind. I'm going to give this a try but what I already tried was based on this LINK for which the source is here and it uses gluLookAt. My understanding of glFrustum was that the params that go in will establish the clipping volume and on applying gluLookAt is when the distortion happens. Meanwhile, I'll give a shot to this method but can you look at the link about lookAt is not the correct way to go there because it seems to be corect? – Drus 26/5, 2013 at 10:50

Here is another link that describes why you don't want to rotate. While the context is stereoscopy, the same principles apply as with head tracking. – Centrum 26/5, 2013 at 12:42

I tried getting the perspective distortion using glFrustum alone and not using gluLookAt but wasn't able to get the effect. I added another question about getting distortion with glFrustum alone to check about this. The same one from where you linked back here. Another link to a off-axis projection question . Using gluLookAt, I am just doing a viewing transformation and moving 'camera' to another position but not rotating it. Wanted to check again with you here. – Drus 27/5, 2013 at 16:32

Oh, that was you. :) Hard to tell anonymous users apart. When you posted that Holotoy link earlier, I wasn't sure anymore, so I added a sample implementation to my answer. Did you try that out? – Centrum 27/5, 2013 at 17:9

I tried the same test here. With the camera eye far from the model, here's the screenshot . When I move the eye closer to the model and towards the left, here the screenshot The top right corner of the cube is on the same height as the top left corner. Does this mean that the viewing transformation is correct here? Translating the scene or moving the camera instead comes to the same thing here, is that the thing? – Drus 27/5, 2013 at 20:16

No, that's not the same test. If the viewing direction center - eye is a multiple of (0,0,-1), then gluLookAt won't rotate, only translate by -eye. But you want the cube to stay at the center of the screen, therefore the center parameter would have to stay at (0,0,0). If you then move the eyes, gluLookAt generates a rotation. And yes, moving the camera by -eye is the same as moving the scene by +eye. – Centrum 28/5, 2013 at 7:18

As per the above calculations, 'camTopLeftNear' and 'camBottomRightNear' become smaller when the user head moves farther. This reduces the clipping volume and th e object drawn seems to he bigger in size. However, normally when we would move farther, any object's size gets reduced. Isn't it? I am having a little of confusion understanding this. P.S I'll upload a video very soon and update you here. Thanks again – Drus 2/6, 2013 at 10:43

I also had a certain doubt as to why do we need to caclculate 'camTopLeftNear' and 'camBottomRightNear'. Since we already calculate 'camTopLeftScreen' and 'camBottomRightScreen', why can't the calculations be in these coordinates? Why do we need multiply this with (1/camTopLeftScreen.z * camZNear). 'camZnear' in my case is 0.1. – Drus 2/6, 2013 at 10:54

As for your first question, the object on the screen shouldn't get smaller. When the user moves away from the screen, the screen in the real world already gets smaller in the real-world field of view of the user, and with it, the object appears smaller, even though its on-screen size remains almost the same. – Centrum 2/6, 2013 at 10:57

Second question: glFrustum expects the top, left, bottom, and right parameters to be on the near plane. camTopLeft/BottomRightScreen are the corner points of the screen, so the z-distance from the eyes is camTopLeftScreen.z. And to compute the parameters on the near plane, we need a z-distance of camZNear, so we have to multiply these points with camZNear/camTopLeftScreen.z. This multiplication means that we scale the corner points in the z direction of the camera coordinate system. Let me see if I can add this to the drawing. – Centrum 2/6, 2013 at 11:2

I'm sorry, by camTopLeft/BottomRightScreen I meant "camTopLeftScreen or camBottomRightScreen", it has nothing to do with division. – Centrum 2/6, 2013 at 11:16

Thanks for an accurate explanation on the second question. About the first, I am not really sure if I still get it ptoprely, so I'll take an example from my side. Assume that for initial user position A, the the 'left/right' parameters of frustum come out to be -0.25/0.25. However, when the user moves farther, the left/ right as in the above case being divided by Z become -0.15/0.15. So, a box that was earlier taking lesser portion portion will take more area and sppear to become larger on screen when the user moves back. – Drus 2/6, 2013 at 14:44

Contrasting this with the real world scenario where objects size decreases when the user moves back, there seems to be something I am missing. Why should the objects on screen become larger in size on screen with increasing Z? – Drus 2/6, 2013 at 14:45

let us continue this discussion in chat – Centrum 2/6, 2013 at 14:47

@AndreasHaferburg I'm trying to implement your code in a vertex shader in Quartz Composer. I use some constants based on the pixels-per-millimeter of my screen. I skip the negative scaling step that's specific to Kinect. Should I finish with gl_Position = mvpCube * gl_Vertex; – Burlingame 12/10, 2014 at 0:32

@DavidBraun Sounds reasonable, that would replace the glLoadMatrix. If needed, maybe you could open a separate question and link it here. – Centrum 12/10, 2014 at 8:1

Good news, I was able to implement Kooima's code in Quartz. He builds the product matrix of a frustum, rotation, and translation. I call that matrix m. I finish my Quartz Vertex shader with gl_Position = m * gl_ModelViewMatrix * gl_Vertex; – Burlingame 12/10, 2014 at 14:1

The best explanation on how to use glFrustum for head tracking applications you can find is in this paper of Robert Kooima called generalized perspective projection:

http://csc.lsu.edu/~kooima/pdfs/gen-perspective.pdf

It also allows you to simply use stereo projections, you just have to switch between left and right cameras!

Frilling answered 3/7, 2013 at 14:54 Comment(1)

I should have bitten the bullet earlier and worked through this paper. The code was actually very easy to implement. Thanks for sharing! – Burlingame 12/10, 2014 at 14:4

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags