Im taking my first steps in making a haar cascade for custom object recognition. Ive spent time getting a fair bit of data and wrote some preprocessing scripts to convert videos to frames. My next step is to crop the object of interes in order to create some positive training examples. I have a few questions which i genuinely have looked around for answers online - i'm slightly confused:-
I read i should aim to keep the aspect ratio the same - does this mean the same as the original frame OR for all images that i want to use for positive training examples (i.e. frames from completely different videos)
Size - aspect ratio and sizing are obviously not the same. So again do i need to ensure my positive samples are all the same height and width (im pretty sure they should be but thought worth double checking).
Also in terms of size - i have come across some blogs recommending for instance 24 x 24 H x W - what if the object i want to detect is not a square (in my case its a rectangle thats height is around double its width for intance a plastic bottle). Do i leave the size the same or should i convert it to 24 x 24?
Negative samples - should these all be the same aspect ratio and / or size?
I understand this is a probably a very low level/ silly question however it's been far from clear what best practice is here!
I have come across a couple of other answers on here but i dont feel like they offer a satisfactory answer and the field has moved on significantly in the past couple of years
Thanks