Is the C++ OpenCV vision library suitable for this case of image segmentation?

R

2

5

I'm trying to find an easy to use vision library for C++. Here's my situation: I have a camera interfaced to the computer (for simplicity, though, we can just assume that image files exist on the computer) and this is what the images will ideally look like:

enter image description here

The idea is that three objects vertically stacked will have highly contrasting colors. I need to determine the locations of the objects, so the vision library will either have to find the edges of the objects or determine their center of masses.

I've never used a vision system before and so I've been doing some research and it seems like OpenCV is pretty popular. Would it be easy to use for my application, or is there another library that could be used to determine the objects' positions easily?

Thanks for your advice!

Rivy answered 29/12, 2011 at 20:0 Comment(0)

M

4

For this particular image, you need not work in full color space, but instead can work on the intensity alone (the "V" part of HSV - "value," meaning intensity).

enter image description here

Whether you use Value space or Hue space, as Penelope mentioned, will depend on the natural images you produce for your real objects. For the general case, you may need to use a combination of hue and value (intensity) to segment images properly. Rather than work in hue-value vector space, it's more straightforward to work in the H and V image planes separately and then combine results. (Segmentation in 3D vector spaces is certainly possible, but would probably be unnecessarily complicated for this project.)

The watershed algorithm in OpenCV could be a good match for your needs. http://www.seas.upenn.edu/~bensapp/opencvdocs/ref/opencvref_cv.htm

One word of caution about Otsu's method: it's fine for separating two modes when a histogram of intensity values (or hue values) is a bimodal distribution, but for natural images it's not common to have true bimodal distributions. If the background and/or foreground objects vary in intensity and/or hue from one side of an object to another, then Otsu can perform poorly.

Otsu can certainly be extended for multiple modes, as is explained in Digital Image Processing by Gonzalez and Woods and other introductory textbooks on the subject. However, a background gradient will cause problems even if you use Otsu to separate one pair of modes at a time.

You also want to ensure that if your camera lens zooms in or out, you'll still find the same binarization thresholds. The basic Otsu technique uses all pixels in the image histogram. That means that you could scramble all of the pixels in the image to produce pure noise with the same image histogram as your original image, and Otsu's method would generate the same threshold.

One common trick is to rely on pixels near edges. In your example we can consider an image to be a region with sharp edges, sharp corners, and (hopefully) uniform HSV values. Sampling pixels near edges can be done in several ways, including the following:

Find strong edge points (using Canny or some simpler technique). Along the direction of the edge gradient, and at distances +/- D from the edge point, sample the gray levels of the (relative) foreground and (relative) background. Distance D should be much smaller than the size of the objects in question.
Find strong edge points. Use the gray levels at the edge points themselves as estimates of the likely desired threshold. In your example, you'll edge up with two strong peaks: one at the edge between object1 and object2, and the other at the edge between object2 and object3.

Since your objects have corners, you can use those to help identify object boundaries and/or edge pixels suitable for sampling.

If you have nominally rectangular objects, you could also use a Hough edge or RANSAC edge algorithm to identify lines in the image, find intersections at corners, etc.

All that said, for nearly any natural image involving objects stacked on top of each other you're going to run into several complications:

Shadows
Color and intensity gradients across an object of nominally consistent color
Edges of varying sharpness if objects are a varying distances from the optical system

If you know for certain how many objects are present, you can try a K Means technique. http://aishack.in/tutorials/knearest-neighbors-in-opencv/

For more complex segmentation tasks, such as when the number of objects isn't known, you can use the Mean Shift technique, though I'd recommend trying simpler techniques first.

The first step and easiest fix is to use proper lighting. To reduce reflections and shadows, use diffuse lighting. For many applications, the closest to ideally diffuse lighting is "cloudy day" lighting: http://www.microscan.com/en-us/products/nerlite-machine-vision-lighting/cdi-illuminators.aspx

More simply, you could try one or more "bounce" lights such as those used in studio photography. http://www.photography.com/articles/taking-photos/bounce-lighting/

Monad answered 30/12, 2011 at 16:0 Comment(0)

C

7

OpenCV is definetely an easy-to use vision library. I've used it in quite a few computer vision projects and to me it is quite intuitive to use.

I'm presuming that the object colors are unknown (if not, here's a pretty good tutorial on how to find a specific color in OpenCV).

Here's a rough idea for solving your problem (I'm thinking along the lines of what operations are easy to implement in OpenCV):

convert the image into HSV colorspace - the colors should have very different Hue values in this space if they have high contrast, so use only the Hue picture
threshold the picture using Otsu's method (threshold will be determined automatically)
if it picks out the middle object (e.g. there are 2 connected components on a background) the segmentation is done. You can use findContours or even Hough transform for lines if the objects are square-shaped.
if it pick out only the outher object, you can again find it's contours the same way, set the Region of Interest (part of the picture you are working on) so it's just inside the contours, and threshold the inner part of the image again to find the border between the other two objects. In the end, just superimpose the contours found on a separate image.
the trickiest case is if it thresholds so that you find only the innermost object. at first glance, you can't differentiate that from the case above, but the second (inner) thresholding will not give any relevant results. In that case, you might pick a Hue just outside the found thresholded area (Hue of the second object), and set the Hue of the (just found) innermost object to that. Now, you get a 2-Hue picture again, which you can threshold and find contours between the outer two objects. In the end, just like in the previous case, superimpose the found contours.

Clance answered 29/12, 2011 at 21:6 Comment(1)

"the colors should have very different Hue values in this space" Useful trick, I"ll remember that. – Blowzed 30/12, 2011 at 16:16

M

4

For this particular image, you need not work in full color space, but instead can work on the intensity alone (the "V" part of HSV - "value," meaning intensity).

enter image description here

Whether you use Value space or Hue space, as Penelope mentioned, will depend on the natural images you produce for your real objects. For the general case, you may need to use a combination of hue and value (intensity) to segment images properly. Rather than work in hue-value vector space, it's more straightforward to work in the H and V image planes separately and then combine results. (Segmentation in 3D vector spaces is certainly possible, but would probably be unnecessarily complicated for this project.)

The watershed algorithm in OpenCV could be a good match for your needs. http://www.seas.upenn.edu/~bensapp/opencvdocs/ref/opencvref_cv.htm

One word of caution about Otsu's method: it's fine for separating two modes when a histogram of intensity values (or hue values) is a bimodal distribution, but for natural images it's not common to have true bimodal distributions. If the background and/or foreground objects vary in intensity and/or hue from one side of an object to another, then Otsu can perform poorly.

Otsu can certainly be extended for multiple modes, as is explained in Digital Image Processing by Gonzalez and Woods and other introductory textbooks on the subject. However, a background gradient will cause problems even if you use Otsu to separate one pair of modes at a time.

You also want to ensure that if your camera lens zooms in or out, you'll still find the same binarization thresholds. The basic Otsu technique uses all pixels in the image histogram. That means that you could scramble all of the pixels in the image to produce pure noise with the same image histogram as your original image, and Otsu's method would generate the same threshold.

One common trick is to rely on pixels near edges. In your example we can consider an image to be a region with sharp edges, sharp corners, and (hopefully) uniform HSV values. Sampling pixels near edges can be done in several ways, including the following:

Find strong edge points (using Canny or some simpler technique). Along the direction of the edge gradient, and at distances +/- D from the edge point, sample the gray levels of the (relative) foreground and (relative) background. Distance D should be much smaller than the size of the objects in question.
Find strong edge points. Use the gray levels at the edge points themselves as estimates of the likely desired threshold. In your example, you'll edge up with two strong peaks: one at the edge between object1 and object2, and the other at the edge between object2 and object3.

Since your objects have corners, you can use those to help identify object boundaries and/or edge pixels suitable for sampling.

If you have nominally rectangular objects, you could also use a Hough edge or RANSAC edge algorithm to identify lines in the image, find intersections at corners, etc.

All that said, for nearly any natural image involving objects stacked on top of each other you're going to run into several complications:

Shadows
Color and intensity gradients across an object of nominally consistent color
Edges of varying sharpness if objects are a varying distances from the optical system

If you know for certain how many objects are present, you can try a K Means technique. http://aishack.in/tutorials/knearest-neighbors-in-opencv/

For more complex segmentation tasks, such as when the number of objects isn't known, you can use the Mean Shift technique, though I'd recommend trying simpler techniques first.

The first step and easiest fix is to use proper lighting. To reduce reflections and shadows, use diffuse lighting. For many applications, the closest to ideally diffuse lighting is "cloudy day" lighting: http://www.microscan.com/en-us/products/nerlite-machine-vision-lighting/cdi-illuminators.aspx

More simply, you could try one or more "bounce" lights such as those used in studio photography. http://www.photography.com/articles/taking-photos/bounce-lighting/

Monad answered 30/12, 2011 at 16:0 Comment(0)

Recommended topics

Hot tags