Kinect for Windows v2 depth to color image misalignment
Asked Answered
A

2

14

currently I am developing a tool for the Kinect for Windows v2 (similar to the one in XBOX ONE). I tried to follow some examples, and have a working example that shows the camera image, the depth image, and an image that maps the depth to the rgb using opencv. But I see that it duplicates my hand when doing the mapping, and I think it is due to something wrong in the coordinate mapper part.

here is an example of it: error

And here is the code snippet that creates the image (rgbd image in the example)

void KinectViewer::create_rgbd(cv::Mat& depth_im, cv::Mat& rgb_im, cv::Mat& rgbd_im){
    HRESULT hr = m_pCoordinateMapper->MapDepthFrameToColorSpace(cDepthWidth * cDepthHeight, (UINT16*)depth_im.data, cDepthWidth * cDepthHeight, m_pColorCoordinates);
    rgbd_im = cv::Mat::zeros(depth_im.rows, depth_im.cols, CV_8UC3);
    double minVal, maxVal;
    cv::minMaxLoc(depth_im, &minVal, &maxVal);
    for (int i=0; i < cDepthHeight; i++){
        for (int j=0; j < cDepthWidth; j++){
            if (depth_im.at<UINT16>(i, j) > 0 && depth_im.at<UINT16>(i, j) < maxVal * (max_z / 100) && depth_im.at<UINT16>(i, j) > maxVal * min_z /100){
                double a = i * cDepthWidth + j;
                ColorSpacePoint colorPoint = m_pColorCoordinates[i*cDepthWidth+j];
                int colorX = (int)(floor(colorPoint.X + 0.5));
                int colorY = (int)(floor(colorPoint.Y + 0.5));
                if ((colorX >= 0) && (colorX < cColorWidth) && (colorY >= 0) && (colorY < cColorHeight))
                {
                    rgbd_im.at<cv::Vec3b>(i, j) = rgb_im.at<cv::Vec3b>(colorY, colorX);
                }
            }

        }
    }
}

Does anyone have a clue of how to solve this? How to prevent this duplication?

Thanks in advance

UPDATE:

If I do a simple depth image thresholding I obtain the following image: thresholding

This is what more or less I expected to happen, and not having a duplicate hand in the background. Is there a way to prevent this duplicate hand in the background?

Asymptomatic answered 11/9, 2014 at 13:18 Comment(15)
where does this mapping come from? most probably you have to edit the calibration between depth image and color image, since the predefinitions aren't perfect. So you have to perform your own calibration. Have a look at: nicolas.burrus.name/index.php/Research/KinectCalibrationYahwistic
It comes from the kinect SDK v2. I was expecting to use the one that comes from the firmware/SDK that uses the intrisics of the camera to do this calculations... but I think the error is huge compared to other cameras firmware/software like Primesense ones with openni. I expected better results, or at least similar to other cameras.... Thanks for the link though :)Asymptomatic
Afaik the auto-calibration data of the kinect saved in firmware isnt that grear. But maybe I'm wrong there.Yahwistic
Hi, I am also using the Kinect for Windows V2. Have you been able to perform the stereo calibration of the kinect? I am using MATLAB to do that but since the camera resolutions are not equal, I am sort of stuck with how to do it.Mauricemauricio
@user2441667 I don't know how it will be in MATLAB, but in c++ the SDK way of doing this is the code snippet I put in my post... with the m_pCoordinateMapper->MapDepthFrameToColorSpace function. The other way will be doing it manually... It is important to notice that the pixels that are not "seen" by the color camera will give you, with the code above, a duplicate pixel of another place (like what I show you in the picture with the duplicate hand)Asymptomatic
It is quite late to ask. Did you find a solution for this problem?Conduction
@Hwathanie Since it was just as an experiment, I haven't dedicated much time recently. Some new updates to the SDK may have solved this issue, I used at that moment the first release of this SDK.Asymptomatic
Hi @Asymptomatic at the moment I am working on something similar and I am trying to adapt your function above in order to try some things. Thus, I would like to ask you if you remember to tell me about some variables and how these are initialized. For example can you tell me what are the values of max_z and min_z also how the m_pColorCoordinates is initialized. I figured out that it is a type of ColorSpacePoint* but how it is declared. Also I guess cDepthWidth/Height and cColorWidth/Height are the dimensions of the depth and color image respectively, right? Thank you in advance.Starks
@theodore Hi, max_z and min_z are just some thresholding variables for the depth value, selected in the simple GUI I created. m_pColorCoordinates I think it is the mapping between color and depth images. This code is an adaptation of one of the examples in the Kinect SDK, you can check what I did in here. It was abandoned since I had to work in other things.Asymptomatic
Hi @Asymptomatic thanks for the feeback and pointing me out to the project's source code. It is really helpful :-). Thanks again!Starks
Same problem here. Were you using the developer pre-release or the final verson of Kinect2? I'm currently here with a very early version of the hardware and I'm wondering if that's the cause.Heigl
@Heigl I do not have access to a kinect 2 at the moment, but I have a possible solution and explanation of the problem. I will probably post it in the upcoming weekend.Asymptomatic
That would be very helpful, thanks in advance!Heigl
@Asymptomatic coming back to this issue, I'm still facing the same problem. Would you mind sharing your knowledge? Thanks.Heigl
@Heigl Sorry for the late answer, but finally I explained the problem behind all this and a naive solution to it :)Asymptomatic
A
1

Finally I get some time to write the long awaited answer.

Lets start with some theory to understand what is really happening and then a possible answer.

We should start by knowing the way to pass from a 3D point cloud which has the depth camera as the coordinate system origin to an image in the image plane of the RGB camera. To do that it is enough to use the camera pinhole model:

enter image description here

In here, u and v are the coordinates in the image plane of the RGB camera. the first matrix in the right side of the equation is the camera matrix, AKA intrinsics of the RGB Camera. The following matrix is the rotation and translation of the extrinsics, or better said, the transformation needed to go from the Depth camera coordinate system to the RGB camera coordinate system. The last part is the 3D point.

Basically, something like this, is what the Kinect SDK does. So, what could go wrong that makes the hand gets duplicated? well, actually more than one point projects to the same pixel....

To put it in other words and in the context of the problem in the question.

The depth image, is a representation of an ordered point cloud, and I am querying the u v values of each of its pixels that in reality can be easily converted to 3D points. The SDK gives you the projection, but it can point to the same pixel (usually, the more distance in the z axis between two neighbor points may give this problem quite easily.

Now, the big question, how can you avoid this.... well, I am not sure using the Kinect SDK, since you do not know the Z value of the points AFTER the extrinsics are applied, so it is not possible to use a technique like the Z buffering.... However, you may assume the Z value will be quite similar and use those from the original pointcloud (at your own risk).

If you were doing it manually, and not with the SDK, you can apply the Extrinsics to the points, and the use the project them into the image plane, marking in another matrix which point is mapped to which pixel and if there is one existing point already mapped, check the z values and compared them and always leave the closest point to the camera. Then, you will have a valid mapping without any problems. This way is kind of a naive way, probably you can get better ones, since the problem is now clear :)

I hope it is clear enough.

P.S.: I do not have Kinect 2 at the moment so I can'T try to see if there is an update relative to this issue or if it still happening the same thing. I used the first released version (not pre release) of the SDK... So, a lot of changes may had happened... If someone knows if this was solve just leave a comment :)

Asymptomatic answered 8/2, 2017 at 21:4 Comment(0)
G
1

I suggest you use the BodyIndexFrame to identify whether a specific value belongs to a player or not. This way, you can reject any RGB pixel that does not belong to a player and keep the rest of them. I do not think that CoordinateMapper is lying.

A few notes:

  • Include the BodyIndexFrame source to your frame reader
  • Use MapColorFrameToDepthSpace instead of MapDepthFrameToColorSpace; this way, you'll get the HD image for the foreground
  • Find the corresponding DepthSpacePoint and depthX, depthY, instead of ColorSpacePoint and colorX, colorY

Here is my approach when a frame arrives (it's in C#):

depthFrame.CopyFrameDataToArray(_depthData);
colorFrame.CopyConvertedFrameDataToArray(_colorData, ColorImageFormat.Bgra);
bodyIndexFrame.CopyFrameDataToArray(_bodyData);

_coordinateMapper.MapColorFrameToDepthSpace(_depthData, _depthPoints);

Array.Clear(_displayPixels, 0, _displayPixels.Length);

for (int colorIndex = 0; colorIndex < _depthPoints.Length; ++colorIndex)
{
    DepthSpacePoint depthPoint = _depthPoints[colorIndex];

    if (!float.IsNegativeInfinity(depthPoint.X) && !float.IsNegativeInfinity(depthPoint.Y))
    {
        int depthX = (int)(depthPoint.X + 0.5f);
        int depthY = (int)(depthPoint.Y + 0.5f);

        if ((depthX >= 0) && (depthX < _depthWidth) && (depthY >= 0) && (depthY < _depthHeight))
        {
            int depthIndex = (depthY * _depthWidth) + depthX;
            byte player = _bodyData[depthIndex];

            // Identify whether the point belongs to a player
            if (player != 0xff)
            {
                int sourceIndex = colorIndex * BYTES_PER_PIXEL;

                _displayPixels[sourceIndex] = _colorData[sourceIndex++];    // B
                _displayPixels[sourceIndex] = _colorData[sourceIndex++];    // G
                _displayPixels[sourceIndex] = _colorData[sourceIndex++];    // R
                _displayPixels[sourceIndex] = 0xff;                         // A
            }
        }
    }
}

Here is the initialization of the arrays:

BYTES_PER_PIXEL = (PixelFormats.Bgr32.BitsPerPixel + 7) / 8;

_colorWidth = colorFrame.FrameDescription.Width;
_colorHeight = colorFrame.FrameDescription.Height;
_depthWidth = depthFrame.FrameDescription.Width;
_depthHeight = depthFrame.FrameDescription.Height;
_bodyIndexWidth = bodyIndexFrame.FrameDescription.Width;
_bodyIndexHeight = bodyIndexFrame.FrameDescription.Height;
_depthData = new ushort[_depthWidth * _depthHeight];
_bodyData = new byte[_depthWidth * _depthHeight];
_colorData = new byte[_colorWidth * _colorHeight * BYTES_PER_PIXEL];
_displayPixels = new byte[_colorWidth * _colorHeight * BYTES_PER_PIXEL];
_depthPoints = new DepthSpacePoint[_colorWidth * _colorHeight];

Notice that the _depthPoints array has a 1920x1080 size.

Once again, the most important thing is to use the BodyIndexFrame source.

Guillermo answered 18/9, 2014 at 19:13 Comment(4)
I think the example image with the hand is a little bit misleading. We are trying to implement a tool to record in different formats a data set. One of the things that we must be able to do with this dataset, is to create a colored point cloud, but the points in the background have the wrong color (looks like a duplicate hand). Is there a way to remove just this "invalid" (we consider them invalid since there should not be a mapping to color, probably because this pixels are not visible in the color image) points.Asymptomatic
Oh, I got it. In your example, you create an RGB image using the depth frame as a base. So, you won't be able to project it on top of the 1920x1080 image. In the code I provided you with, an RGBA bitmap of 1920x1080 size is generated. As a result, you can place it on top of another 1920x1080 bitmap. Did you try that?Guillermo
I tried it, and got something without duplicates, but I think that a lot of invalid pixels are gone, and make me wonder about how good is this mapping? If you see the window over the door in the example, it has a lot of invalid pixels in the depth image. When i do this mapping that you suggest, most of them actually dissappears.... so i was wondering, what happen to them? maybe some interpolation problem in the framework?Asymptomatic
Not sure, unless we have the final version. All this is subject to change.Guillermo
A
1

Finally I get some time to write the long awaited answer.

Lets start with some theory to understand what is really happening and then a possible answer.

We should start by knowing the way to pass from a 3D point cloud which has the depth camera as the coordinate system origin to an image in the image plane of the RGB camera. To do that it is enough to use the camera pinhole model:

enter image description here

In here, u and v are the coordinates in the image plane of the RGB camera. the first matrix in the right side of the equation is the camera matrix, AKA intrinsics of the RGB Camera. The following matrix is the rotation and translation of the extrinsics, or better said, the transformation needed to go from the Depth camera coordinate system to the RGB camera coordinate system. The last part is the 3D point.

Basically, something like this, is what the Kinect SDK does. So, what could go wrong that makes the hand gets duplicated? well, actually more than one point projects to the same pixel....

To put it in other words and in the context of the problem in the question.

The depth image, is a representation of an ordered point cloud, and I am querying the u v values of each of its pixels that in reality can be easily converted to 3D points. The SDK gives you the projection, but it can point to the same pixel (usually, the more distance in the z axis between two neighbor points may give this problem quite easily.

Now, the big question, how can you avoid this.... well, I am not sure using the Kinect SDK, since you do not know the Z value of the points AFTER the extrinsics are applied, so it is not possible to use a technique like the Z buffering.... However, you may assume the Z value will be quite similar and use those from the original pointcloud (at your own risk).

If you were doing it manually, and not with the SDK, you can apply the Extrinsics to the points, and the use the project them into the image plane, marking in another matrix which point is mapped to which pixel and if there is one existing point already mapped, check the z values and compared them and always leave the closest point to the camera. Then, you will have a valid mapping without any problems. This way is kind of a naive way, probably you can get better ones, since the problem is now clear :)

I hope it is clear enough.

P.S.: I do not have Kinect 2 at the moment so I can'T try to see if there is an update relative to this issue or if it still happening the same thing. I used the first released version (not pre release) of the SDK... So, a lot of changes may had happened... If someone knows if this was solve just leave a comment :)

Asymptomatic answered 8/2, 2017 at 21:4 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.