haar cascade positive example image sizing
Asked Answered
S

1

7

Im taking my first steps in making a haar cascade for custom object recognition. Ive spent time getting a fair bit of data and wrote some preprocessing scripts to convert videos to frames. My next step is to crop the object of interes in order to create some positive training examples. I have a few questions which i genuinely have looked around for answers online - i'm slightly confused:-

I read i should aim to keep the aspect ratio the same - does this mean the same as the original frame OR for all images that i want to use for positive training examples (i.e. frames from completely different videos)

Size - aspect ratio and sizing are obviously not the same. So again do i need to ensure my positive samples are all the same height and width (im pretty sure they should be but thought worth double checking).

Also in terms of size - i have come across some blogs recommending for instance 24 x 24 H x W - what if the object i want to detect is not a square (in my case its a rectangle thats height is around double its width for intance a plastic bottle). Do i leave the size the same or should i convert it to 24 x 24?

Negative samples - should these all be the same aspect ratio and / or size?

I understand this is a probably a very low level/ silly question however it's been far from clear what best practice is here!

I have come across a couple of other answers on here but i dont feel like they offer a satisfactory answer and the field has moved on significantly in the past couple of years

Thanks

Suu answered 7/11, 2016 at 22:45 Comment(0)
D
4

The positive samples are generated in a .vec file, which is needed for the training. The createsamples binary will create such a .vec file and automatically scale your defined object regions (defined in a .txt file) to the target format. All your positive sample object regions should have about the same aspect ratio (because the automatic scaling would ruin them otherwise).

The target size should be the mimimun size you want to detect an object (but if too small there wont be thecrelevant features anymore) and its aspect ratio should be the aspect ratio of your object regions.

For example: You have a lot of images with cups. The image resolutions varies, but the aspect ratio of each cup (only the cup region within the image, not all the background) is a out 1:2 (width:height). So you either crop all the images to only hold the cup and minimal background and write the whole cropped image to the txt file and post the full size roi of the cropped image there , or you select the ROI of the cup, add the full size image to the txt file and post that roi region there. The you select a target size like 20x40 or 10x20 or whatever 1:2 aspect ratio you think can be trained.

The negative samples should stay as they are, the training will automatically choose and search subimages of those samples. Just make sure that there are no cups (according to the example) in them.

I've had some good results by drawing black boxes over the objects in the positive samples and use the resulting images as negative samples too, to get more negative samples, but that might depend on your special task.

As a more concrete example, I've taken two cup images from wikimedia. 1. enter image description here 2. enter image description here

There is 1 cup in the first image and 2 cups in the second image. I've chosen to not use the handle during training and choose an aspect ratio of 0.85 (1:1.176 w:h) Now you can either choose to write the ROIs to the .txt file, like

image1.jpg 1 653 154 1295 1523
image2.jpg 2 1068 406 1551 1824 3036 1159 852 1004

Or you can first crop the images to these: enter image description here enter image description here enter image description here

and then create a txt file like this:

cropped_image1_cup1.jpg 1 0 0 1295 1523
cropped_image2_cup1.jpg 1 0 0 1551 1824
cropped_image2_cup2.jpg 1 0 0 852 1004

Both should create the same .vec file (if the cropping didnt create any artifacts like additional jpeg compression - better use png ;) ).

You could then choose the target size to be 20x24 for example (aspect ratio 1:1.2). It is good to code a script or tool which fixes the aspect ratio in your labeled input images, so it is much easier and more intuitive to not label your objects with perfect aspect ratio, but label them as they are and postprocess by adjsuting the ROIs to fit the aspect ratio (add some additional background at left/right or top/bottom if necessary). Or ignore the aspect ratio difference, if some deformation is ok for you.

Demmy answered 8/11, 2016 at 7:32 Comment(11)
Hi thanks - this is a super helpful reply. I will go back through and implement suggested changes! I went ahead and ran it anyway in the first instance however it ended up hanging at stage 15 of 20. Im just trying to find out how to compile an xml from the stages it achieved!Suu
call your exact same command line again but change numStages from 20 to 14 (0 to 13 prev computed) or 15 (0 to 14 prev comp). It will load the previously computed stages up to that number and write the xmlDemmy
ok so i got it to generate a haar xml for 13 of the stages. It works (kind of) - its not massively sensitive. I have a couple of options really here .... 1. add in more positives and negatives and train via the cloud for faster training time and (hopefully) improved accuracy 2. try a different approach (likely in the cloud too) via a convolutional neural net. Again this seems feasible and i have a fair bit of data to feed in.Suu
one final clarification that might help here....when i cropped my original images they did not resize scaled up (like yours in the example) they stayed as the size of he crop taken from the original image (if this makes sense?). I.e. the SWL mug is much bigger in the second example than in the original image which includes the background. Now im also wondering if i should have made those original positives bigger after cropping!Suu
how many positives and negatives did you provide? With createsamples binary you can create additionale positive samples by applying geometric transformations. For more sensitivity you'll need more positives and a bigger minHitRatio and for more specificity you'll need (much) more negatives and more stages or a lower maxFalseAlarms parameter. I dont have experience with tree splits and varying window size yet...Demmy
ok so i gave it 83 positives and 1030 negatives - all negatives where highly similar which im thinking in retrospect could be an issue!Suu
size of the positive samples doesnt matter (except target aspect ratio and including minimum background) - they can be anything bigger than your target size. Createsamples binary has a parameter to show the created positives, try that to be sure they make sense.Demmy
getting more positives (and negatives) actually wont be an issue at all - i managed to get a decent chunk of data however i knew that the first run through the loop would put my laptop on downtime for a bit so i just wanted to get some proof of concept. I think the next rational thing to do is add both more positives and negatives and then do things in the cloud.Suu
ok also re: image sizing i read that haars cant scale down so my thinking was to go for smaller sized images and let the algorithim scale up If i move things to the cloud i can also try a CNN too and see how that performsSuu
yes, negatives should vary much (unless your target task background is constant) and much more positives if possible. In my very specialized task I used 15000 positives and 60000 negatives images in total, and 13000 positives & 26000 negatives per stageDemmy
ok so in the use case i have in mind my target background wont be constant but it will be highly similar - that said in this case it was just prove the concept and keep moving things ahead. I think for a first go im happy with the results and yeah with numbers like that i think i definitely need to go via the cloud :) Thanks for the feedback its been intersting. If you have a blog let me know!Suu

© 2022 - 2024 — McMap. All rights reserved.