Segmentation with Single Point Class Annotations via Graph Cuts?
Asked Answered
P

1

12

I have a dataset of images that I am trying to segment. For each image in the dataset, experts have randomly selected single pixels/points and added class annotations as to what class that pixel belongs to. In other words, each image will have about 60 points labeled thus:

x, y, class

How can I best leverage the knowledge of these single pixel annotations to perform a good semantic segmentation?

A similar question was asked before and the response was to use graph cuts:

"hard" supervision in image segmentation with python

Graph cuts in theory seems like a good candidate but would graph cuts work with single pixel annotations? Furthermore are there methods for it to work with a multiclass dataset? If so, is there a good library implementation or some good resources for this?

If not, what method would best fit this scenario? I played a bit with random walks but the resulting segmentation had poor edge localization (extremely rounded edges).

Any help, resources or examples you can give would be very appreciated (preferably with python libraries but I can really work with anything).

EDIT: My dataset has about 10 different classes, each image probably has about 5 on average. The annotators are not guaranteed to annotate every religion but it is rare to miss one (a few missing regions or incorrectly labeled regions are tolerable). The classes each correspond to texturally uniform areas and the textures are fairly constant (think sky, dirt, water, mountain). You can't get texture from a single point but almost all regions should have multiple points annotated.

Pettit answered 5/8, 2017 at 1:41 Comment(6)
can you post two or three example images and the corresponding annotations? How many classes do you have in general? How many classes per image (roughly)? Does your annotators annotate ALL regions of the image? e.g., an image with a woman with white shirt and black skirt standing on grass: will you have a single point inside the woman with class "person" and a second one on the grass? How would you expect GC in this scenario to know the white and black regions belongs together?Dissent
I added an edit to my post to answer the first few questions. Also: The classes each correspond to texturally uniform areas (so shirt and skirt would be separate classes in your example). Since I am not sure that my data can be shared, here is data from a dataset with a similar setup and in a somewhat similar problem domain: adrix.com/data_sample.zip - my images will not have frames or lines through them like this one.Pettit
how many annotated images you have in total?Dissent
A few hundred in my datasetPettit
When you say "I can really work with anything" does it include deep-learning semantic segmentation models in caffe or tensorflow?Dissent
I haven't worked with deep learning for semantic segmentation models specifically before but I have worked with cnns and rnns in general (both training new models and working with pretrained models), so with some knowledge of where to look and a bit of documentation, I could figure it out. I've worked mostly in tensorflow so my caffe experience is fairly limited but I am not afraid to dive in if I need to.Pettit
D
6

An interesting problem. Since there is no concrete example to work on, I will only outline algorithmic approaches I would have tried myself.


Approach #1: use dense descriptors

  • Compute dense image descriptors (e.g., SIFT/HOG/Gabor or even better using pre-trained deep net like VGG).
  • Take the descriptors from all images from the annotated locations only: you should have ~10K descriptors with class labels. Train a simple classifier (e.g., SVM) on this set.
  • Go back to the images: apply the classifier and output log-probability for each pixel to belong to each of the classes. This should be the unary term (aka "data term") for you graph-cut.
  • Locally modify the unary term to force the annotated points to belong to the right class.
  • Use simple pair-wise term (image gradients or some edge based term).
  • Apply Graph-Cut to get semantic segmentation.

Approach #2: train your own deep semantic segmentation model

In order to train a fully convolutional model for segmentation you don't necessarily have to have labels for all pixels. You may have "ignore_label": pixel labeled with that label are ignored and do not contribute to the loss.
Your case is an extreme case of "ignore_label" - you only have ~60 pixel labeled per image. Nevertheless, it may be interesting to see what you can learn with such sparse information.

Coming to think of it, you have more information per image than just the points labeled:

My dataset has about 10 different classes, each image probably has about 5 on average

Meaning that if image has labels for classes 1..5, you know it does not contain classes 6..10 (!) You may have a "positive term" in the loss assigning the very few labeled pixels to the right classes, and a negative term" for all the rest of the pixels that penalize if they are assigned to classes not present at all in the image.

Dissent answered 7/8, 2017 at 15:24 Comment(2)
Depending on the way annotators placed their markers, you might be able to "dilate" the points a bit to earn a few extra labeled pixels per image. This will not be sufficient to be considered as "scribbles" but it might add some extra information to the methods proposed here.Goldner
This is great, I am pursuing Approach #2. Similar to what @Goldner suggested, I am oversegmenting with superpixels and then using the annotations to label certain superpixels giving me a bit more data to work with.Pettit

© 2022 - 2024 — McMap. All rights reserved.