How to Handle Occlusion and Fragmentation
Asked Answered
A

2

12

I am trying to implement a people counting system using computer vision for uni project. Currently, my method is:

  1. Background subtraction using MOG2
  2. Morphological filter to remove noise
  3. Track blob
  4. Count blob passing a specified region (a line)

The problem is if people come as group, my method only counts one people. From my readings, I believe this is what called as occlusion. Another problem is when people looks similar to background (use dark clothing and passing a black pillar/wall), the blob is separated while it is actually one person. enter image description here

From what I read, I should implement a detector + tracker (e.g. detect human using HOG). But my detection result is poor (e.g. 50% false positives with 50% hit rate; using OpenCV human detector and my own trained detector) so I am not convinced to use the detector as basis for tracking. Thanks for your answers and time for reading this post!

Arianna answered 26/4, 2013 at 8:54 Comment(0)
P
5

There is no single "good" answer to this as handling occlusion (and background substraction) are still open problems! There are several pointers that can be given that might help you along with your project.

You want to detect if a "blob" is one person or a group of people. There are several things you could do to handle this.

  • Use multiple cameras (it's unlikely that a group of people is detected as a single blob from all angles)
  • Try to detect parts of the human body. If you detect two heads on a single blob, there are multiple people. Same can be said for 3 legs, 5 shoulders, etc.

On the area of tracking a "lost" person (one walking behind another object), is to extrapolate it's position. You know that a person can only move so much in between frames. By holding this into account, you know that it's impossible for a user to be detected in the middle of your image and then suddenly disappear. After several frames of not seeing that person, you can discard the observation, as the person might have had enough time to move away.

Philosopher answered 26/4, 2013 at 9:44 Comment(2)
I see, so what I need is to improve and modify my detector, from full human detector to parts based human detector. I will read more about that, because I am more familiar with detection, using HOG, LBP, Latent SVM rather that multiple camera. Thanks Nallath!Arianna
It's a bit like the so called "Bag of words" model.Philosopher
E
6

Tracking people in video surveillance sequences is still an open problem in the research community. However particule filters (PF) (aka sequential monte-carlo) gives good results towards occlusion and complex scene. You should read this. There is also extra links to example source code after biblio.

An advantage on using PF is the gain in computational time towards tracking by detection (only).

If you go this way, feel free to ask for better understanding about the maths behind the PF.

Elanaeland answered 26/4, 2013 at 10:20 Comment(1)
Thanks for the heads up, @Eric. I found several related article or discussion about people tracking which mention about particle filter. - #15873984 - https://mcmap.net/q/1010734/-tracking-blobs-with-opencv/… I will surely delve deeper into the subjects! PS. I guess you mean particle filter, not particular filterArianna
P
5

There is no single "good" answer to this as handling occlusion (and background substraction) are still open problems! There are several pointers that can be given that might help you along with your project.

You want to detect if a "blob" is one person or a group of people. There are several things you could do to handle this.

  • Use multiple cameras (it's unlikely that a group of people is detected as a single blob from all angles)
  • Try to detect parts of the human body. If you detect two heads on a single blob, there are multiple people. Same can be said for 3 legs, 5 shoulders, etc.

On the area of tracking a "lost" person (one walking behind another object), is to extrapolate it's position. You know that a person can only move so much in between frames. By holding this into account, you know that it's impossible for a user to be detected in the middle of your image and then suddenly disappear. After several frames of not seeing that person, you can discard the observation, as the person might have had enough time to move away.

Philosopher answered 26/4, 2013 at 9:44 Comment(2)
I see, so what I need is to improve and modify my detector, from full human detector to parts based human detector. I will read more about that, because I am more familiar with detection, using HOG, LBP, Latent SVM rather that multiple camera. Thanks Nallath!Arianna
It's a bit like the so called "Bag of words" model.Philosopher

© 2022 - 2024 — McMap. All rights reserved.