How can I remove noise from this video sequence?

Asked 28/8, 2012 at 12:8 Answered 28/8, 2012 at 21:38

image-processing signal-processing kinect noise

Hello I am trying to do some image processing. I use Microsoft Kinect to detect humans on a room. I get depth data, do some background subtraction work and end up with a video sequence like this when a person enters the scene and walks around:

http://www.screenr.com/h7f8

I put a video so that you can see the behaviour of the noise in the video. Different colors represent different levels of depth. White represents empty. As you can see it is pretty noisy, especially the red noises.

I need to get rid of everything except the human as much as possible. When I do erosion/dilation (using a very big window size) I can get rid of a lot of the noise but I wondered if there are other methods I can use. Especially the red noise in the video is hard to remove using erosion/dilation.

Some notes:

1) A better background subtraction could be done if we knew when there are no humans in the scene but the background subtraction we do is fully automatic and it works even when there are humans in the scene and even when the camera is moved etc. so this is the best background subtraction we can get right now.

2) The algorithm will work on an embedded system, real time. So the more efficient and easy the algorithm the better. And it doesn't have to be perfect. Though complicated signal processing techniques are also welcome (maybe we might use them on another project who does not need embedded, real time processing).

3) I don't need an actual code. Just ideas.

Archiplasm answered 28/8, 2012 at 12:8 Comment(2)

Knowing more about the background subtraction could help; i.e. why is there noise left in the image? – Spondylitis 28/8, 2012 at 13:12

What SDK/driver are you using (e.g. MS Kinect SDK, OpenNI, libfreenect, etc.) ? – Farnesol 1/9, 2012 at 22:37

Just my two cents:

If you don't mind using the SDK for that, then you can very easily keep only the person pixels using the PlayerIndexBitmask as Outlaw Lemur shows.

Now you may not want to be dependable on the drivers for that and want to do it in an image processing level. An approach that we had tried in a project and worked pretty good was contour based. We began by a background subtraction and then we detected the largest contour in the image assuming that this was the person (since usually the noise that remained was very small blobs) and we filled that contour and kept that. You could also use some kind of median filtering as a first pass.

Of course, this is not perfect nor suitable in every case and probably there are a lot better methods. But I'm just throwing it out there in case it helps you come up with any ideas.

Hrutkay answered 28/8, 2012 at 19:52 Comment(0)

Take a look at the eyesweb.

It is a platform for designing that supports kinect device and you can apply noise filters on the outputs. It is a very usefull and simple tool for multimodal systems designing.

Aegospotami answered 28/8, 2012 at 21:38 Comment(0)

This is pretty simple assuming you are using the Kinect SDK. I would follow this video for Depth basics, and do something like this:

    private byte[] GenerateColoredBytes(DepthImageFrame depthFrame)
    {

        //get the raw data from kinect with the depth for every pixel
        short[] rawDepthData = new short[depthFrame.PixelDataLength];
        depthFrame.CopyPixelDataTo(rawDepthData); 

        //use depthFrame to create the image to display on-screen
        //depthFrame contains color information for all pixels in image
        //Height x Width x 4 (Red, Green, Blue, empty byte)
        Byte[] pixels = new byte[depthFrame.Height * depthFrame.Width * 4];

        //Bgr32  - Blue, Green, Red, empty byte
        //Bgra32 - Blue, Green, Red, transparency 
        //You must set transparency for Bgra as .NET defaults a byte to 0 = fully transparent

        //hardcoded locations to Blue, Green, Red (BGR) index positions       
        const int BlueIndex = 0;
        const int GreenIndex = 1;
        const int RedIndex = 2;


        //loop through all distances
        //pick a RGB color based on distance
        for (int depthIndex = 0, colorIndex = 0; 
            depthIndex < rawDepthData.Length && colorIndex < pixels.Length; 
            depthIndex++, colorIndex += 4)
        {
            //get the player (requires skeleton tracking enabled for values)
            int player = rawDepthData[depthIndex] & DepthImageFrame.PlayerIndexBitmask;

            //gets the depth value
            int depth = rawDepthData[depthIndex] >> DepthImageFrame.PlayerIndexBitmaskWidth;

            //.9M or 2.95'
            if (depth <= 900)
            {
                //we are very close
                pixels[colorIndex + BlueIndex] = Colors.White.B;
                pixels[colorIndex + GreenIndex] = Colors.White.G;
                pixels[colorIndex + RedIndex] = Colors.White.R;
            }
            // .9M - 2M or 2.95' - 6.56'
            else if (depth > 900 && depth < 2000)
            {
                //we are a bit further away
                pixels[colorIndex + BlueIndex] = Colors.White.B;
                pixels[colorIndex + GreenIndex] = Colors.White.G;
                pixels[colorIndex + RedIndex] = Colors.White.R;
            }
            // 2M+ or 6.56'+
            else if (depth > 2000)
            {
                //we are the farthest
                pixels[colorIndex + BlueIndex] = Colors.White.B;
                pixels[colorIndex + GreenIndex] = Colors.White.G;
                pixels[colorIndex + RedIndex] = Colors.White.R;
            }


            ////equal coloring for monochromatic histogram
            //byte intensity = CalculateIntensityFromDepth(depth);
            //pixels[colorIndex + BlueIndex] = intensity;
            //pixels[colorIndex + GreenIndex] = intensity;
            //pixels[colorIndex + RedIndex] = intensity;


            //Color all players "gold"
            if (player > 0)
            {
                pixels[colorIndex + BlueIndex] = Colors.Gold.B;
                pixels[colorIndex + GreenIndex] = Colors.Gold.G;
                pixels[colorIndex + RedIndex] = Colors.Gold.R;
            }

        }


        return pixels;
    }

This turns everything except humans white, and the humans are gold. Hope this helps!

EDIT

I know you didn't necessarily want code just ideas, so I would say find an algorithm that finds the depth, and one that finds the amount of humans, and color everything white except the humans. I have provided all of this, but I didn't know if you knew what was going on. Also I have an image of the final program.

Note: I added the second depth frame for perspective

Rexer answered 28/8, 2012 at 13:8 Comment(0)

I may be wrong (I'd need the video without processing for that) but I'd tend to say that you are trying to get rid of illumination changes.

This is what makes people detection really difficult in 'real' environmnents.

You can check out this other SO question for some links.

I used to detect humans real-time in the same configuration than you, but with monocular vision. In my case, a really good descriptor was the LBPs, that is mainly used for texture classification. This is quite simple to put into practice (there are implementations all over the web).

The LBPs where basically used to define an area of interest where movement is detected, so that I can process only part of the image and get rid of all that noise.

This paper for example uses LBP for grayscale correction of images.

Hope that brings some new ideas.

Dishpan answered 28/8, 2012 at 13:24 Comment(0)

Recommended topics

Hot tags