Setting a stage, in a district there are 200 + analog cameras to a central monitoring station, but these cameras do not have recognition of faces or objects. Is it possible to implement face detection to these cameras? Are there any prerequisites that these cameras must possess? How can I load the image that those cameras send and process them?
As your question does not tackle a specific coding issue, but rather focuses on a general concept of how to solve a pattern recognition task, I would like to give you some overview of the steps one has to consider. As you already noticed, I wrote somehting about pattern recognition (PR). Well that's what you want to do, besides pattern analysis which is actually step 2.
Let's start off with a PR pipeline like this:
1. Signal acquisition
To analyze a signal, you have to have a signal in the first place. Collecting this signal is the first step, and let me tell you one thing: There is not much to do right here, but many wrongs. For you, as you want to access a video signal from analog cameras, the first thing to do would be an A/D conversion, so you get a digital signal to work with. I do not have any impression of the video quality of your cameras, but keep one thing in mind: The signal that you feed into your feature extraction (FE) tool will almost certainly be a single frame per FE step. So check the quality of a frame, not of the entire video, this might give you a wrong impression. Apart from that, you cannot really influence your signal acquisition. I did not write anything about how to access the analog video from the camera, simply because there is no information about their environment.
2. Signal pre-processing
Now when you have some digital signal (you can do this with analog signals as well of course, but for the sake of simplicity, I will only cover digital pre-processing here), you want to get the most out of that. What does this mean? Every signal X
you collect (or observe) is basically a mixture of the true (uncorrupted) signal S
and some sort of noise N
:
X = S + N
What you ideally want to have is S
. If you manage to decrease the noise N
, you reduce its contribution to the overall signal X
that you have. Therefore, during pre-processing you often do some kind of filtering. For an image, you could apply a median filter on a single frame for example, to deal with Salt and Petter noise. But signal processing can have many different apsects, you have to read up on that topic for your particular task. Long story short: You want to increase your signal quality by reducing noise, artifacts,...
3. Feature extraction / classification
Now you have a signal of sufficient quality (this is an assumption, I do not know anything about your particular setup). You want to do feature extraction now. What is this? Only a single video frame contains a lot of information. Lets say you have a resolution of 720×576 pixels, then you have over 400.000 values already, and this is not even a good image quality. First thing: not every pixel is interesting for what you want to do. I will consider only the face recognition task now: You want the pixels that show a human face. All other pixels are less interesting for you. You have to do two things now, first detect a face, and then process it further for the detection of people. For the general face detection there is different approaches, such as pattern matching, texture matching, or a convolutional neural network (CNN). After a successful face detection, what do you do with the face? Again, you have different options, such as eigenfaces, Scale-invariant Feature Transform (SIFT) or once more a CNN.
Additionally, your algorithms for classification need training. Training an algorithm means optimizing its parameters with respect to a certain goal. And here it gets really tricky: Not only do you need sample data (you could collect this from your cameras), but you also need labels. For the face detection, you can get some decent pre-trained algorithms, which should also work on your data. For the face recognition however, you would need samples from every person you wanted to detect. And you have to annotate them. You will need plenty of them, ideally captured with the cameras you will use for the recognition as well. Again, there are pre-trained models, but you still have to show these pre-trained models the new faces you would like to recognize.
This whole pipeline assumes that you have a sufficient quality of input signal, which I am not sure about. If so, it gives you a quick guideline on what steps are necessary for your task. If step 1 is successful, step 2 is not a big thing anymore. Step 3 will be the most challenging, because you have to have some training data.
I did not write anything regarding privacy laws in different countries. I simply expect you to respect whatever restrictions you might be facing!
© 2022 - 2024 — McMap. All rights reserved.