Working with depth data - Kinect
Asked Answered
F

1

9

I just started learning about Kinect through some quick start videos and was trying out the code to work with depth data.

However, I am not able to understand how the distance is being calculated using bit-shifting and various other formulas that are being employed to calculate other stuff too while working with this depth data.

http://channel9.msdn.com/Series/KinectSDKQuickstarts/Working-with-Depth-Data

Are these the particulars which are Kinect-specifics explained in the documentation etc.? Any help would be appreciated.

Thanks

Fenske answered 10/11, 2011 at 17:32 Comment(0)
C
13

Pixel depth

When you don't have the kinect set up to detect players, it is a simply array of bytes, with two bytes representing a single depth measurement.

So, just like in a 16 bit color image, each sixteen bits represent a depth rather than a color.

If the array were for a hypothetical 2x2 pixel depth image, you might see: [0x12 0x34 0x56 0x78 0x91 0x23 0x45 0x67] which would represent the following four pixels:

AB
CD

A = 0x34 << 8 + 0x12
B = 0x78 << 8 + 0x56
C = 0x23 << 8 + 0x91
D = 0x67 << 8 + 0x45

The << 8 simply moves that byte into the upper 8 bits of a 16 bit number. It's the same as multiplying it by 256. The whole 16 bit numbers become 0x3412, 0x7856, 0x2391, 0x6745. You could instead do A = 0x34 * 256 + 0x12. In simpler terms, it's like saying I have 329 items and 456 thousands of items. If I have that total of items, I can multiply the 456 by 1,000, and add it to the 329 to get the total number of items. The kinect has broken the whole number up into two pieces, and you simply have to add them together. I could "shift" the 456 over to the left by 3 zero digits, which is the same as multiplying by 1,000. It would then be 456000. So the shift and the multiplication are the same thing for whole amounts of 10. In computers, whole amounts of 2 are the same - 8 bits is 256, so multiplying by 256 is the same as shifting left by 8.

And that would be your four pixel depth image - each resulting 16 bit number represents the depth at that pixel.

Player depth

When you select to show player data it becomes a little more interesting. The bottom three bits of the whole 16 bit number tell you the player that number is part of.

To simplify things, ignore the complicated method they use to get the remaining 13 bits of depth data, and just do the above, and steal the lower three bits:

A = 0x34 << 8 + 0x12
B = 0x78 << 8 + 0x56
C = 0x23 << 8 + 0x91
D = 0x67 << 8 + 0x45

Ap = A % 8
Bp = B % 8
Cp = C % 8
Dp = D % 8

A = A / 8
B = B / 8
C = C / 8
D = D / 8

Now the pixel A has player Ap and depth A. The % gets the remainder of the division - so take A, divide it by 8, and the remainder is the player number. The result of the division is the depth, the remainder is the player, so A now contains the depth since we got rid of the player by A=A/8.

If you don't need player support, at least at the beginning of your development, skip this and just use the first method. If you do need player support, though, this is one of many ways to get it. There are faster methods, but the compiler usually turns the above division and remainder (modulus) operations into more efficient bitwise logic operations so you don't need to worry about it, generally.

Culpa answered 10/11, 2011 at 18:6 Comment(2)
Thanks a lot for this pretty explanation! That surely does fully explain up things. I was also curious to know if these particulars are actually mentioned somewhere in a documentation etc. ?Fenske
@Fenske I doubt it. This sort of discussion is considered low level - the equations used in the video are more elegant forms of the above. In other words, the developers that made the documentation and videos assume that programmers using the kinect will already have a solid understand of C style array representation, bit shifting, and bit logic. Over time you'll pick up on a lot of these concepts and this sort of thing will become easier.Culpa

© 2022 - 2024 — McMap. All rights reserved.